sgkit.Tajimas_D¶
-
sgkit.
Tajimas_D
(ds, *, variant_allele_count='variant_allele_count', stat_diversity='stat_diversity', merge=True)¶ Compute Tajimas’ D for a genotype call dataset.
By default, values of this statistic are calculated per variant. To compute values in windows, call
window()
before calling this function.- Parameters
- ds :
Dataset
Dataset
Genotype call dataset.
- variant_allele_count :
Hashable
Hashable
(default:'variant_allele_count'
) Variant allele count variable to use or calculate. Defined by
sgkit.variables.variant_allele_count_spec
. If the variable is not present inds
, it will be computed usingcount_variant_alleles()
.- stat_diversity :
Hashable
Hashable
(default:'stat_diversity'
) Diversity variable to use or calculate. Defined by
sgkit.variables.stat_diversity_spec
. If the variable is not present inds
, it will be computed usingdiversity()
.- merge :
bool
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds :
- Return type
- Returns
A dataset containing the Tajimas’ D value, as defined by
sgkit.variables.stat_Tajimas_D_spec
. Shape (variants, cohorts), or (windows, cohorts) if windowing information is available.
Warning
This method does not currently support datasets that are chunked along the samples dimension.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts >>> sample_cohort = np.repeat([0, 1], ds.dims["samples"] // 2) >>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> sg.Tajimas_D(ds)["stat_Tajimas_D"].values array([[-3.35891429, -2.96698697], [-2.96698697, -3.35891429], [-2.96698697, -2.96698697], [-3.35891429, -3.35891429], [-3.35891429, -3.35891429]])
>>> # Divide into windows of size three (variants) >>> ds = sg.window(ds, size=3) >>> sg.Tajimas_D(ds)["stat_Tajimas_D"].values array([[-0.22349574, -0.22349574], [-2.18313233, -2.18313233]])