sgkit.sample_stats#
- sgkit.sample_stats(ds, *, call_genotype='call_genotype', merge=True)#
Compute quality control sample statistics from genotype calls.
- Parameters:
- ds
Dataset
Dataset containing genotype calls.
- call_genotype
Hashable
(default:'call_genotype'
) Input variable name holding call_genotype. Defined by
sgkit.variables.call_genotype_spec
. Must be present inds
.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing the following variables:
sgkit.variables.sample_n_called_spec
(samples): The number of variants with called genotypes.sgkit.variables.sample_call_rate_spec
(samples): The fraction of variants with called genotypes.sgkit.variables.sample_n_het_spec
(samples): The number of variants with heterozygous calls.sgkit.variables.sample_n_hom_ref_spec
(samples): The number of variants with homozygous reference calls.sgkit.variables.sample_n_hom_alt_spec
(samples): The number of variants with homozygous alternate calls.sgkit.variables.sample_n_non_ref_spec
(samples): The number of variants that are not homozygous reference calls.
Note
If the dataset contains partial genotype calls (i.e., genotype calls with a mixture of called and missing alleles), these genotypes will be ignored when counting the number of homozygous, heterozygous or total genotype calls.
Note
When used on autopolyploid genotypes, this method treats genotypes calls with any level of heterozygosity as ‘heterozygous’. Only fully homozygous genotype calls (e.g. 0/0/0/0) will be classified as ‘homozygous’.
Warning
This method does not support mixed-ploidy datasets.
- Raises:
ValueError – If the dataset contains mixed-ploidy genotype calls.