sgkit.Garud_H#

sgkit.Garud_H(ds, *, call_genotype='call_genotype', sample_cohort='sample_cohort', cohorts=None, merge=True)#

Compute the H1, H12, H123 and H2/H1 statistics for detecting signatures of soft sweeps, as defined in Garud et al. (2015).

This method requires a windowed dataset. To window a dataset, call window_by_position() or window_by_variant() before calling this function.

Parameters:
ds Dataset

Genotype call dataset.

call_genotype Hashable (default: 'call_genotype')

Input variable name holding call_genotype as defined by sgkit.variables.call_genotype_spec. Must be present in ds.

sample_cohort Hashable (default: 'sample_cohort')

Input variable name holding sample_cohort as defined by sgkit.variables.sample_cohort_spec.

cohorts Sequence[Union[int, str]] | NoneOptional[Sequence[Union[int, str]]] (default: None)

The cohorts to compute statistics for, specified as a sequence of cohort indexes or IDs. None (the default) means compute statistics for all cohorts.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing the following variables:

Raises:

Warning

This function is currently only implemented for diploid datasets.

Examples

>>> import numpy as np
>>> import sgkit as sg
>>> import xarray as xr
>>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts
>>> sample_cohort = np.repeat([0, 1], ds.sizes["samples"] // 2)
>>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> # Divide into windows of size three (variants)
>>> ds = sg.window_by_variant(ds, size=3, step=3)
>>> gh = sg.Garud_H(ds)
>>> gh["stat_Garud_h1"].values 
array([[0.25 , 0.375],
    [0.375, 0.375]])
>>> gh["stat_Garud_h12"].values 
array([[0.375, 0.625],
    [0.625, 0.625]])
>>> gh["stat_Garud_h123"].values 
array([[0.625, 1.   ],
    [1.   , 1.   ]])
>>> gh["stat_Garud_h2_h1"].values 
array([[0.75      , 0.33333333],
    [0.33333333, 0.33333333]])