sgkit.variant_stats#
- sgkit.variant_stats(ds, *, call_genotype='call_genotype', variant_allele_count='variant_allele_count', merge=True)#
Compute quality control variant statistics from genotype calls.
- Parameters:
- ds
Dataset
Dataset containing genotype calls.
- call_genotype
Hashable
(default:'call_genotype'
) Input variable name holding call_genotype. Defined by
sgkit.variables.call_genotype_spec
. Must be present inds
.- variant_allele_count
Hashable
(default:'variant_allele_count'
) Input variable name holding variant_allele_count, as defined by
sgkit.variables.variant_allele_count_spec
. If the variable is not present inds
, it will be computed usingcount_variant_alleles()
.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing the following variables:
sgkit.variables.variant_n_called_spec
(variants): The number of samples with called genotypes.sgkit.variables.variant_call_rate_spec
(variants): The fraction of samples with called genotypes.sgkit.variables.variant_n_het_spec
(variants): The number of samples with heterozygous calls.sgkit.variables.variant_n_hom_ref_spec
(variants): The number of samples with homozygous reference calls.sgkit.variables.variant_n_hom_alt_spec
(variants): The number of samples with homozygous alternate calls.sgkit.variables.variant_n_non_ref_spec
(variants): The number of samples that are not homozygous reference calls.sgkit.variables.variant_allele_count_spec
(variants, alleles): The number of occurrences of each allele.sgkit.variables.variant_allele_total_spec
(variants): The number of occurrences of all alleles.sgkit.variables.variant_allele_frequency_spec
(variants, alleles): The frequency of occurrence of each allele.
Note
If the dataset contains partial genotype calls (i.e., genotype calls with a mixture of called and missing alleles), these genotypes will be ignored when counting the number of homozygous, heterozygous or total genotype calls. However, the called alleles will be counted when calculating allele counts and frequencies using
count_variant_alleles()
.Note
When used on autopolyploid genotypes, this method treats genotypes calls with any level of heterozygosity as ‘heterozygous’. Only fully homozygous genotype calls (e.g. 0/0/0/0) will be classified as ‘homozygous’.
Warning
This method does not support mixed-ploidy datasets.
- Raises:
ValueError – If the dataset contains mixed-ploidy genotype calls.
See also