sgkit.pbs#
- sgkit.pbs(ds, *, stat_Fst='stat_Fst', cohorts=None, merge=True)#
Compute the population branching statistic (PBS) between cohort triples.
By default, values of this statistic are calculated per variant. To compute values in windows, call
window_by_position()
orwindow_by_variant()
before calling this function.- Parameters:
- ds
Dataset
Genotype call dataset.
- stat_Fst
Hashable
(default:'stat_Fst'
) Fst variable to use or calculate. Defined by
sgkit.variables.stat_Fst_spec
. If the variable is not present inds
, it will be computed usingFst()
.- cohorts
Sequence
[Union
[Tuple
[int
,int
,int
],Tuple
[str
,str
,str
]]] |None
Optional
[Sequence
[Union
[Tuple
[int
,int
,int
],Tuple
[str
,str
,str
]]]] (default:None
) The cohort triples to compute statistics for, specified as a sequence of tuples of cohort indexes or IDs. None (the default) means compute statistics for all cohorts.
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing the PBS value between cohort triples, as defined by
sgkit.variables.stat_pbs_spec
. Shape (variants, cohorts, cohorts, cohorts), or (windows, cohorts, cohorts, cohorts) if windowing information is available.
Warning
This method does not currently support datasets that are chunked along the samples dimension.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=6)
>>> # Divide samples into three named cohorts >>> n_cohorts = 3 >>> sample_cohort = np.repeat(range(n_cohorts), ds.sizes["samples"] // n_cohorts) >>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples") >>> cohort_names = [f"co_{i}" for i in range(n_cohorts)] >>> ds = ds.assign_coords({"cohorts_0": cohort_names, "cohorts_1": cohort_names, "cohorts_2": cohort_names})
>>> # Divide into two windows of size three (variants) >>> ds = sg.window_by_variant(ds, size=3) >>> sg.pbs(ds)["stat_pbs"].sel(cohorts_0="co_0", cohorts_1="co_1", cohorts_2="co_2").values array([ 0. , -0.160898])