sgkit.hardy_weinberg_test#
- sgkit.hardy_weinberg_test(ds, *, genotype_count='variant_genotype_count', ploidy=None, alleles=None, merge=True)#
Exact test for HWE as described in Wigginton et al. 2005 [1].
- Parameters:
- ds
Dataset Dataset containing genotype calls or precomputed genotype counts.
- genotype_count
Hashable|NoneOptional[Hashable] (default:'variant_genotype_count') Name of variable containing precomputed genotype counts for each variant as described in
sgkit.variables.variant_genotype_count_spec. If the variable is not present inds, it will be computed usingcount_variant_genotypes()which automatically assigns coordinates to thegenotypesdimension.- ploidy
int|NoneOptional[int] (default:None) Genotype ploidy, defaults to
ploidydimension of provided dataset. If the ploidy dimension is not present, then this value must be set explicitly. Currently HWE calculations are only supported for diploid datasets, i.e.ploidymust equal 2.- alleles
int|NoneOptional[int] (default:None) Genotype allele count, defaults to
allelesdimension of provided dataset. If the alleles dimension is not present, then this value must be set explicitly. Currently HWE calculations are only supported for biallelic datasets, i.e.allelesmust equal 2.- merge
bool(default:True) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
Warning
This function is only applicable to diploid, biallelic datasets. The
genotype_countarray should have three columns corresponding to thegenotypesdimension. These columns should have coordinates'0/0','0/1', and'1/1'which respectively contain counts for homozygous reference, heterozygous, and homozygous alternate genotypes.- Return type:
- Returns:
: Dataset containing (N = num variants):
- variant_hwe_p_value[array-like, shape: (N, O)]
P values from HWE test for each variant as float in [0, 1].
- Raises:
NotImplementedError – If the dataset is not limited to biallelic, diploid genotypes.
ValueError – If the ploidy or number of alleles are not specified and not present as dimensions in the dataset.
ValueError – If no coordinates are assigned to the
genotypesdimension.KeyError – If the genotypes
'0/0','0/1'or'1/1'are not specified as coordinates of thegenotypesdimension.
References
- [1] Wigginton, Janis E., David J. Cutler, and Goncalo R. Abecasis. 2005.
“A Note on Exact Tests of Hardy-Weinberg Equilibrium.” American Journal of Human Genetics 76 (5): 887–93.