sgkit.Garud_H#

sgkit.Garud_H(ds, *, call_genotype='call_genotype', sample_cohort='sample_cohort', cohorts=None, merge=True)#

Compute the H1, H12, H123 and H2/H1 statistics for detecting signatures of soft sweeps, as defined in Garud et al. (2015).

By default, values of this statistic are calculated across all variants. To compute values in windows, call window_by_position() or window_by_variant() before calling this function.

Parameters
ds : Dataset

Genotype call dataset.

call_genotype : Hashable (default: 'call_genotype')

Input variable name holding call_genotype as defined by sgkit.variables.call_genotype_spec. Must be present in ds.

sample_cohort : Hashable (default: 'sample_cohort')

Input variable name holding sample_cohort as defined by sgkit.variables.sample_cohort_spec.

cohorts : Sequence[Union[int, str]] | NoneOptional[Sequence[Union[int, str]]] (default: None)

The cohorts to compute statistics for, specified as a sequence of cohort indexes or IDs. None (the default) means compute statistics for all cohorts.

merge : bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type

Dataset

Returns

A dataset containing the following variables:

Raises

NotImplementedError – If the dataset is not diploid.

Warning

This function is currently only implemented for diploid datasets.

Examples

>>> import numpy as np
>>> import sgkit as sg
>>> import xarray as xr
>>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts
>>> sample_cohort = np.repeat([0, 1], ds.dims["samples"] // 2)
>>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> # Divide into windows of size three (variants)
>>> ds = sg.window_by_variant(ds, size=3, step=3)
>>> gh = sg.Garud_H(ds)
>>> gh["stat_Garud_h1"].values 
array([[0.25 , 0.375],
    [0.375, 0.375]])
>>> gh["stat_Garud_h12"].values 
array([[0.375, 0.625],
    [0.625, 0.625]])
>>> gh["stat_Garud_h123"].values 
array([[0.625, 1.   ],
    [1.   , 1.   ]])
>>> gh["stat_Garud_h2_h1"].values 
array([[0.75      , 0.33333333],
    [0.33333333, 0.33333333]])