sgkit.hardy_weinberg_test#
- sgkit.hardy_weinberg_test(ds, *, genotype_count='variant_genotype_count', ploidy=None, alleles=None, merge=True)#
Exact test for HWE as described in Wigginton et al. 2005 [1].
- Parameters:
- ds
Dataset
Dataset containing genotype calls or precomputed genotype counts.
- genotype_count
Hashable
|None
Optional
[Hashable
] (default:'variant_genotype_count'
) Name of variable containing precomputed genotype counts for each variant as described in
sgkit.variables.variant_genotype_count_spec
. If the variable is not present inds
, it will be computed usingcount_variant_genotypes()
which automatically assigns coordinates to thegenotypes
dimension.- ploidy
int
|None
Optional
[int
] (default:None
) Genotype ploidy, defaults to
ploidy
dimension of provided dataset. If the ploidy dimension is not present, then this value must be set explicitly. Currently HWE calculations are only supported for diploid datasets, i.e.ploidy
must equal 2.- alleles
int
|None
Optional
[int
] (default:None
) Genotype allele count, defaults to
alleles
dimension of provided dataset. If the alleles dimension is not present, then this value must be set explicitly. Currently HWE calculations are only supported for biallelic datasets, i.e.alleles
must equal 2.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
Warning
This function is only applicable to diploid, biallelic datasets. The
genotype_count
array should have three columns corresponding to thegenotypes
dimension. These columns should have coordinates'0/0'
,'0/1'
, and'1/1'
which respectively contain counts for homozygous reference, heterozygous, and homozygous alternate genotypes.- Return type:
- Returns:
: Dataset containing (N = num variants):
- variant_hwe_p_value[array-like, shape: (N, O)]
P values from HWE test for each variant as float in [0, 1].
- Raises:
NotImplementedError – If the dataset is not limited to biallelic, diploid genotypes.
ValueError – If the ploidy or number of alleles are not specified and not present as dimensions in the dataset.
ValueError – If no coordinates are assigned to the
genotypes
dimension.KeyError – If the genotypes
'0/0'
,'0/1'
or'1/1'
are not specified as coordinates of thegenotypes
dimension.
References
- [1] Wigginton, Janis E., David J. Cutler, and Goncalo R. Abecasis. 2005.
“A Note on Exact Tests of Hardy-Weinberg Equilibrium.” American Journal of Human Genetics 76 (5): 887–93.