sgkit.Tajimas_D#
- sgkit.Tajimas_D(ds, *, variant_allele_count='variant_allele_count', stat_diversity='stat_diversity', merge=True)#
Compute Tajimas’ D for a genotype call dataset.
By default, values of this statistic are calculated per variant. To compute values in windows, call
window_by_position()
orwindow_by_variant()
before calling this function.- Parameters:
- ds
Dataset
Genotype call dataset.
- variant_allele_count
Hashable
(default:'variant_allele_count'
) Variant allele count variable to use or calculate. Defined by
sgkit.variables.variant_allele_count_spec
. If the variable is not present inds
, it will be computed usingcount_variant_alleles()
.- stat_diversity
Hashable
(default:'stat_diversity'
) Diversity variable to use or calculate. Defined by
sgkit.variables.stat_diversity_spec
. If the variable is not present inds
, it will be computed usingdiversity()
.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing the Tajimas’ D value, as defined by
sgkit.variables.stat_Tajimas_D_spec
. Shape (variants, cohorts), or (windows, cohorts) if windowing information is available.
Warning
This method does not currently support datasets that are chunked along the samples dimension.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts >>> sample_cohort = np.repeat([0, 1], ds.sizes["samples"] // 2) >>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> sg.Tajimas_D(ds)["stat_Tajimas_D"].values array([[0.88883234, 2.18459998], [2.18459998, 0.88883234], [2.18459998, 2.18459998], [0.88883234, 0.88883234], [0.88883234, 0.88883234]])
>>> # Divide into windows of size three (variants) >>> ds = sg.window_by_variant(ds, size=3) >>> sg.Tajimas_D(ds)["stat_Tajimas_D"].values array([[2.40517586, 2.40517586], [1.10393559, 1.10393559]])