sgkit.hardy_weinberg_test¶
- sgkit.hardy_weinberg_test(ds, *, genotype_count=None, ploidy=None, alleles=None, merge=True)¶
Exact test for HWE as described in Wigginton et al. 2005 [1].
- Parameters
- ds :
Dataset
Dataset
Dataset containing genotype calls or precomputed genotype counts.
- genotype_count :
Hashable
|None
Optional
[Hashable
] (default:None
) Name of variable containing precomputed genotype counts, by default None. If not provided, these counts will be computed automatically from genotype calls. If present, must correspond to an (N, 3) array where N is equal to the number of variants and the 3 columns contain heterozygous, homozygous reference, and homozygous alternate counts (in that order) across all samples for a variant.
- ploidy :
int
|None
Optional
[int
] (default:None
) Genotype ploidy, defaults to
ploidy
dimension of provided dataset. If the ploidy dimension is not present, then this value must be set explicitly. Currently HWE calculations are only supported for diploid datasets, i.e.ploidy
must equal 2.- alleles :
int
|None
Optional
[int
] (default:None
) Genotype allele count, defaults to
alleles
dimension of provided dataset. If the alleles dimension is not present, then this value must be set explicitly. Currently HWE calculations are only supported for biallelic datasets, i.e.alleles
must equal 2.- merge :
bool
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds :
Warning
This function is only applicable to diploid, biallelic datasets.
- Return type
- Returns
Dataset containing (N = num variants):
- variant_hwe_p_value[array-like, shape: (N, O)]
P values from HWE test for each variant as float in [0, 1].
References
- [1] Wigginton, Janis E., David J. Cutler, and Goncalo R. Abecasis. 2005.
“A Note on Exact Tests of Hardy-Weinberg Equilibrium.” American Journal of Human Genetics 76 (5): 887–93.
- Raises
NotImplementedError – If ploidy of provided dataset != 2
NotImplementedError – If maximum number of alleles in provided dataset != 2