sgkit.sample_stats#

sgkit.sample_stats(ds, *, call_genotype='call_genotype', merge=True)#

Compute quality control sample statistics from genotype calls.

Parameters:
ds Dataset

Dataset containing genotype calls.

call_genotype Hashable (default: 'call_genotype')

Input variable name holding call_genotype. Defined by sgkit.variables.call_genotype_spec. Must be present in ds.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing the following variables:

Note

If the dataset contains partial genotype calls (i.e., genotype calls with a mixture of called and missing alleles), these genotypes will be ignored when counting the number of homozygous, heterozygous or total genotype calls.

Note

When used on autopolyploid genotypes, this method treats genotypes calls with any level of heterozygosity as ‘heterozygous’. Only fully homozygous genotype calls (e.g. 0/0/0/0) will be classified as ‘homozygous’.

Warning

This method does not support mixed-ploidy datasets.

Raises:

ValueError – If the dataset contains mixed-ploidy genotype calls.