sgkit.Fst#
- sgkit.Fst(ds, *, estimator=None, stat_divergence='stat_divergence', merge=True)#
Compute Fst between pairs of cohorts.
By default, values of this statistic are calculated per variant. To compute values in windows, call
window_by_position()
orwindow_by_variant()
before calling this function.- Parameters:
- ds
Dataset
Genotype call dataset.
- estimator
str
|None
Optional
[str
] (default:None
) Determines the formula to use for computing Fst. If None (the default), or
Hudson
, Fst is calculated using the method of Hudson (1992) elaborated by Bhatia et al. (2013), (the same estimator as scikit-allel). Other supported estimators includeNei
(1986), (the same estimator as tskit).- stat_divergence
Hashable
(default:'stat_divergence'
) Divergence variable to use or calculate. Defined by
sgkit.variables.stat_divergence_spec
. If the variable is not present inds
, it will be computed usingdivergence()
.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing the Fst value between pairs of cohorts, as defined by
sgkit.variables.stat_Fst_spec
. Shape (variants, cohorts, cohorts), or (windows, cohorts, cohorts) if windowing information is available.
Warning
This method does not currently support datasets that are chunked along the samples dimension.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts >>> sample_cohort = np.repeat([0, 1], ds.sizes["samples"] // 2) >>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> sg.Fst(ds)["stat_Fst"].values array([[[ nan, -0.16666667], [-0.16666667, nan]], [[ nan, -0.16666667], [-0.16666667, nan]], [[ nan, -0.33333333], [-0.33333333, nan]], [[ nan, -0.33333333], [-0.33333333, nan]], [[ nan, 0.2 ], [ 0.2 , nan]]])
>>> # Divide into windows of size three (variants) >>> ds = sg.window_by_variant(ds, size=3) >>> sg.Fst(ds)["stat_Fst"].values array([[[ nan, -0.22222222], [-0.22222222, nan]], [[ nan, 0. ], [ 0. , nan]]])