sgkit.variant_stats#

sgkit.variant_stats(ds, *, call_genotype='call_genotype', variant_allele_count='variant_allele_count', merge=True)#

Compute quality control variant statistics from genotype calls.

Parameters:
ds Dataset

Dataset containing genotype calls.

call_genotype Hashable (default: 'call_genotype')

Input variable name holding call_genotype. Defined by sgkit.variables.call_genotype_spec. Must be present in ds.

variant_allele_count Hashable (default: 'variant_allele_count')

Input variable name holding variant_allele_count, as defined by sgkit.variables.variant_allele_count_spec. If the variable is not present in ds, it will be computed using count_variant_alleles().

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing the following variables:

Note

If the dataset contains partial genotype calls (i.e., genotype calls with a mixture of called and missing alleles), these genotypes will be ignored when counting the number of homozygous, heterozygous or total genotype calls. However, the called alleles will be counted when calculating allele counts and frequencies using count_variant_alleles().

Note

When used on autopolyploid genotypes, this method treats genotypes calls with any level of heterozygosity as ‘heterozygous’. Only fully homozygous genotype calls (e.g. 0/0/0/0) will be classified as ‘homozygous’.

Warning

This method does not support mixed-ploidy datasets.

Raises:

ValueError – If the dataset contains mixed-ploidy genotype calls.