sgkit.Garud_H#
- sgkit.Garud_H(ds, *, call_genotype='call_genotype', sample_cohort='sample_cohort', cohorts=None, merge=True)#
Compute the H1, H12, H123 and H2/H1 statistics for detecting signatures of soft sweeps, as defined in Garud et al. (2015).
This method requires a windowed dataset. To window a dataset, call
window_by_position()
orwindow_by_variant()
before calling this function.- Parameters:
- ds
Dataset
Genotype call dataset.
- call_genotype
Hashable
(default:'call_genotype'
) Input variable name holding call_genotype as defined by
sgkit.variables.call_genotype_spec
. Must be present inds
.- sample_cohort
Hashable
(default:'sample_cohort'
) Input variable name holding sample_cohort as defined by
sgkit.variables.sample_cohort_spec
.- cohorts
Sequence
[Union
[int
,str
]] |None
Optional
[Sequence
[Union
[int
,str
]]] (default:None
) The cohorts to compute statistics for, specified as a sequence of cohort indexes or IDs. None (the default) means compute statistics for all cohorts.
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing the following variables:
- stat_Garud_h1 (windows, cohorts): Garud H1 statistic.
Defined by
sgkit.variables.stat_Garud_h1_spec
.
- stat_Garud_h12 (windows, cohorts): Garud H12 statistic.
Defined by
sgkit.variables.stat_Garud_h12_spec
.
- stat_Garud_h123 (windows, cohorts): Garud H123 statistic.
Defined by
sgkit.variables.stat_Garud_h123_spec
.
- stat_Garud_h2_h1 (windows, cohorts): Garud H2/H1 statistic.
Defined by
sgkit.variables.stat_Garud_h2_h1_spec
.
- Raises:
NotImplementedError – If the dataset is not diploid.
ValueError – If the dataset is not windowed.
Warning
This function is currently only implemented for diploid datasets.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts >>> sample_cohort = np.repeat([0, 1], ds.sizes["samples"] // 2) >>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> # Divide into windows of size three (variants) >>> ds = sg.window_by_variant(ds, size=3, step=3)
>>> gh = sg.Garud_H(ds) >>> gh["stat_Garud_h1"].values array([[0.25 , 0.375], [0.375, 0.375]]) >>> gh["stat_Garud_h12"].values array([[0.375, 0.625], [0.625, 0.625]]) >>> gh["stat_Garud_h123"].values array([[0.625, 1. ], [1. , 1. ]]) >>> gh["stat_Garud_h2_h1"].values array([[0.75 , 0.33333333], [0.33333333, 0.33333333]])