sgkit.sample_stats
sgkit.sample_stats#
- sgkit.sample_stats(ds, *, call_genotype_mask='call_genotype_mask', call_genotype='call_genotype', variant_allele_count='variant_allele_count', merge=True)#
Compute quality control sample statistics from genotype calls.
- Parameters
- ds :
Dataset
Dataset containing genotype calls.
- call_genotype :
Hashable
(default:'call_genotype'
) Input variable name holding call_genotype. Defined by
sgkit.variables.call_genotype_spec
. Must be present inds
.- call_genotype_mask :
Hashable
(default:'call_genotype_mask'
) Input variable name holding call_genotype_mask. Defined by
sgkit.variables.call_genotype_mask_spec
Must be present inds
.- variant_allele_count :
Hashable
(default:'variant_allele_count'
) Input variable name holding variant_allele_count, as defined by
sgkit.variables.variant_allele_count_spec
. If the variable is not present inds
, it will be computed usingcount_variant_alleles()
.- merge :
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds :
- Return type
- Returns
A dataset containing the following variables:
sgkit.variables.sample_n_called_spec
(samples): The number of variants with called genotypes.sgkit.variables.sample_call_rate_spec
(samples): The fraction of variants with called genotypes.sgkit.variables.sample_n_het_spec
(samples): The number of variants with heterozygous calls.sgkit.variables.sample_n_hom_ref_spec
(samples): The number of variants with homozygous reference calls.sgkit.variables.sample_n_hom_alt_spec
(samples): The number of variants with homozygous alternate calls.sgkit.variables.sample_n_non_ref_spec
(samples): The number of variants that are not homozygous reference calls.