sgkit.Garud_H¶
- sgkit.Garud_H(ds, *, call_genotype='call_genotype', sample_cohort='sample_cohort', cohorts=None, merge=True)¶
Compute the H1, H12, H123 and H2/H1 statistics for detecting signatures of soft sweeps, as defined in Garud et al. (2015).
By default, values of this statistic are calculated across all variants. To compute values in windows, call
window_by_position()
orwindow_by_variant()
before calling this function.- Parameters
- ds :
Dataset
Dataset
Genotype call dataset.
- call_genotype :
Hashable
Hashable
(default:'call_genotype'
) Input variable name holding call_genotype as defined by
sgkit.variables.call_genotype_spec
. Must be present inds
.- sample_cohort :
Hashable
Hashable
(default:'sample_cohort'
) Input variable name holding sample_cohort as defined by
sgkit.variables.sample_cohort_spec
.- cohorts :
Sequence
[Union
[int
,str
]] |None
Optional
[Sequence
[Union
[int
,str
]]] (default:None
) The cohorts to compute statistics for, specified as a sequence of cohort indexes or IDs. None (the default) means compute statistics for all cohorts.
- merge :
bool
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds :
- Return type
- Returns
A dataset containing the following variables:
- stat_Garud_h1 (windows, cohorts): Garud H1 statistic.
Defined by
sgkit.variables.stat_Garud_h1_spec
.
- stat_Garud_h12 (windows, cohorts): Garud H12 statistic.
Defined by
sgkit.variables.stat_Garud_h12_spec
.
- stat_Garud_h123 (windows, cohorts): Garud H123 statistic.
Defined by
sgkit.variables.stat_Garud_h123_spec
.
- stat_Garud_h2_h1 (windows, cohorts): Garud H2/H1 statistic.
Defined by
sgkit.variables.stat_Garud_h2_h1_spec
.
- Raises
NotImplementedError – If the dataset is not diploid.
Warning
This function is currently only implemented for diploid datasets.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts >>> sample_cohort = np.repeat([0, 1], ds.dims["samples"] // 2) >>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> # Divide into windows of size three (variants) >>> ds = sg.window_by_variant(ds, size=3, step=3)
>>> gh = sg.Garud_H(ds) >>> gh["stat_Garud_h1"].values array([[0.25 , 0.375], [0.375, 0.375]]) >>> gh["stat_Garud_h12"].values array([[0.375, 0.625], [0.625, 0.625]]) >>> gh["stat_Garud_h123"].values array([[0.625, 1. ], [1. , 1. ]]) >>> gh["stat_Garud_h2_h1"].values array([[0.75 , 0.33333333], [0.33333333, 0.33333333]])