sgkit.Fst
sgkit.Fst#
- sgkit.Fst(ds, *, estimator=None, stat_divergence='stat_divergence', merge=True)#
Compute Fst between pairs of cohorts.
By default, values of this statistic are calculated per variant. To compute values in windows, call
window_by_position()orwindow_by_variant()before calling this function.- Parameters
- ds :
Dataset Genotype call dataset.
- estimator :
str|NoneOptional[str] (default:None) Determines the formula to use for computing Fst. If None (the default), or
Hudson, Fst is calculated using the method of Hudson (1992) elaborated by Bhatia et al. (2013), (the same estimator as scikit-allel). Other supported estimators includeNei(1986), (the same estimator as tskit).- stat_divergence :
Hashable(default:'stat_divergence') Divergence variable to use or calculate. Defined by
sgkit.variables.stat_divergence_spec. If the variable is not present inds, it will be computed usingdivergence().- merge :
bool(default:True) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds :
- Return type
- Returns
A dataset containing the Fst value between pairs of cohorts, as defined by
sgkit.variables.stat_Fst_spec. Shape (variants, cohorts, cohorts), or (windows, cohorts, cohorts) if windowing information is available.
Warning
This method does not currently support datasets that are chunked along the samples dimension.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts >>> sample_cohort = np.repeat([0, 1], ds.dims["samples"] // 2) >>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> sg.Fst(ds)["stat_Fst"].values array([[[ nan, -0.16666667], [-0.16666667, nan]], [[ nan, -0.16666667], [-0.16666667, nan]], [[ nan, -0.33333333], [-0.33333333, nan]], [[ nan, -0.33333333], [-0.33333333, nan]], [[ nan, 0.2 ], [ 0.2 , nan]]])
>>> # Divide into windows of size three (variants) >>> ds = sg.window_by_variant(ds, size=3) >>> sg.Fst(ds)["stat_Fst"].values array([[[ nan, -0.22222222], [-0.22222222, nan]], [[ nan, 0. ], [ 0. , nan]]])