sgkit.hardy_weinberg_test#
- sgkit.hardy_weinberg_test(ds, *, genotype_count='variant_genotype_count', ploidy=None, alleles=None, merge=True)#
- Exact test for HWE as described in Wigginton et al. 2005 [1]. - Parameters:
- ds Dataset
- Dataset containing genotype calls or precomputed genotype counts. 
- genotype_count Hashable|NoneOptional[Hashable] (default:'variant_genotype_count')
- Name of variable containing precomputed genotype counts for each variant as described in - sgkit.variables.variant_genotype_count_spec. If the variable is not present in- ds, it will be computed using- count_variant_genotypes()which automatically assigns coordinates to the- genotypesdimension.
- ploidy int|NoneOptional[int] (default:None)
- Genotype ploidy, defaults to - ploidydimension of provided dataset. If the ploidy dimension is not present, then this value must be set explicitly. Currently HWE calculations are only supported for diploid datasets, i.e.- ploidymust equal 2.
- alleles int|NoneOptional[int] (default:None)
- Genotype allele count, defaults to - allelesdimension of provided dataset. If the alleles dimension is not present, then this value must be set explicitly. Currently HWE calculations are only supported for biallelic datasets, i.e.- allelesmust equal 2.
- merge bool(default:True)
- If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details. 
 
- ds 
 - Warning - This function is only applicable to diploid, biallelic datasets. The - genotype_countarray should have three columns corresponding to the- genotypesdimension. These columns should have coordinates- '0/0',- '0/1', and- '1/1'which respectively contain counts for homozygous reference, heterozygous, and homozygous alternate genotypes.- Return type:
- Returns:
- : Dataset containing (N = num variants): - variant_hwe_p_value[array-like, shape: (N, O)]
- P values from HWE test for each variant as float in [0, 1]. 
 
- Raises:
- NotImplementedError – If the dataset is not limited to biallelic, diploid genotypes. 
- ValueError – If the ploidy or number of alleles are not specified and not present as dimensions in the dataset. 
- ValueError – If no coordinates are assigned to the - genotypesdimension.
- KeyError – If the genotypes - '0/0',- '0/1'or- '1/1'are not specified as coordinates of the- genotypesdimension.
 
 - References - [1] Wigginton, Janis E., David J. Cutler, and Goncalo R. Abecasis. 2005.
- “A Note on Exact Tests of Hardy-Weinberg Equilibrium.” American Journal of Human Genetics 76 (5): 887–93. 
 
 
 
    