sgkit.Garud_H#

sgkit.Garud_H(ds, *, call_genotype='call_genotype', sample_cohort='sample_cohort', cohorts=None, merge=True)#

Compute the H1, H12, H123 and H2/H1 statistics for detecting signatures of soft sweeps, as defined in Garud et al. (2015).

This method requires a windowed dataset. To window a dataset, call window_by_position() or window_by_variant() before calling this function.

Parameters:

ds Dataset: Genotype call dataset.
call_genotype Hashable (default: 'call_genotype'): Input variable name holding call_genotype as defined by sgkit.variables.call_genotype_spec. Must be present in ds.
sample_cohort Hashable (default: 'sample_cohort'): Input variable name holding sample_cohort as defined by sgkit.variables.sample_cohort_spec.
cohorts Sequence[Union[int, str]] | NoneOptional[Sequence[Union[int, str]]] (default: None): The cohorts to compute statistics for, specified as a sequence of cohort indexes or IDs. None (the default) means compute statistics for all cohorts.
merge bool (default: True): If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing the following variables:

stat_Garud_h1 (windows, cohorts): Garud H1 statistic.
Defined by sgkit.variables.stat_Garud_h1_spec.
stat_Garud_h12 (windows, cohorts): Garud H12 statistic.
Defined by sgkit.variables.stat_Garud_h12_spec.
stat_Garud_h123 (windows, cohorts): Garud H123 statistic.
Defined by sgkit.variables.stat_Garud_h123_spec.
stat_Garud_h2_h1 (windows, cohorts): Garud H2/H1 statistic.
Defined by sgkit.variables.stat_Garud_h2_h1_spec.

Raises:

NotImplementedError – If the dataset is not diploid.
ValueError – If the dataset is not windowed.

Warning

This function is currently only implemented for diploid datasets.

Examples

>>> import numpy as np
>>> import sgkit as sg
>>> import xarray as xr
>>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)

>>> # Divide samples into two cohorts
>>> sample_cohort = np.repeat([0, 1], ds.sizes["samples"] // 2)
>>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")

>>> # Divide into windows of size three (variants)
>>> ds = sg.window_by_variant(ds, size=3, step=3)

>>> gh = sg.Garud_H(ds)
>>> gh["stat_Garud_h1"].values 
array([[0.25 , 0.375],
    [0.375, 0.375]])
>>> gh["stat_Garud_h12"].values 
array([[0.375, 0.625],
    [0.625, 0.625]])
>>> gh["stat_Garud_h123"].values 
array([[0.625, 1.   ],
    [1.   , 1.   ]])
>>> gh["stat_Garud_h2_h1"].values 
array([[0.75      , 0.33333333],
    [0.33333333, 0.33333333]])

sgkit.Garud_H#

This Page