sgkit.pbs#

sgkit.pbs(ds, *, stat_Fst='stat_Fst', cohorts=None, merge=True)#

Compute the population branching statistic (PBS) between cohort triples.

By default, values of this statistic are calculated per variant. To compute values in windows, call window_by_position() or window_by_variant() before calling this function.

Parameters:
ds Dataset

Genotype call dataset.

stat_Fst Hashable (default: 'stat_Fst')

Fst variable to use or calculate. Defined by sgkit.variables.stat_Fst_spec. If the variable is not present in ds, it will be computed using Fst().

cohorts Sequence[Union[Tuple[int, int, int], Tuple[str, str, str]]] | NoneOptional[Sequence[Union[Tuple[int, int, int], Tuple[str, str, str]]]] (default: None)

The cohort triples to compute statistics for, specified as a sequence of tuples of cohort indexes or IDs. None (the default) means compute statistics for all cohorts.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing the PBS value between cohort triples, as defined by sgkit.variables.stat_pbs_spec. Shape (variants, cohorts, cohorts, cohorts), or (windows, cohorts, cohorts, cohorts) if windowing information is available.

Warning

This method does not currently support datasets that are chunked along the samples dimension.

Examples

>>> import numpy as np
>>> import sgkit as sg
>>> import xarray as xr
>>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=6)
>>> # Divide samples into three named cohorts
>>> n_cohorts = 3
>>> sample_cohort = np.repeat(range(n_cohorts), ds.sizes["samples"] // n_cohorts)
>>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> cohort_names = [f"co_{i}" for i in range(n_cohorts)]
>>> ds = ds.assign_coords({"cohorts_0": cohort_names, "cohorts_1": cohort_names, "cohorts_2": cohort_names})
>>> # Divide into two windows of size three (variants)
>>> ds = sg.window_by_variant(ds, size=3)
>>> sg.pbs(ds)["stat_pbs"].sel(cohorts_0="co_0", cohorts_1="co_1", cohorts_2="co_2").values 
array([ 0.      , -0.160898])