sgkit.sample_stats
sgkit.sample_stats#
- sgkit.sample_stats(ds, *, call_genotype_mask='call_genotype_mask', call_genotype='call_genotype', variant_allele_count='variant_allele_count', merge=True)#
Compute quality control sample statistics from genotype calls.
- Parameters
- ds :
Dataset Dataset containing genotype calls.
- call_genotype :
Hashable(default:'call_genotype') Input variable name holding call_genotype. Defined by
sgkit.variables.call_genotype_spec. Must be present inds.- call_genotype_mask :
Hashable(default:'call_genotype_mask') Input variable name holding call_genotype_mask. Defined by
sgkit.variables.call_genotype_mask_specMust be present inds.- variant_allele_count :
Hashable(default:'variant_allele_count') Input variable name holding variant_allele_count, as defined by
sgkit.variables.variant_allele_count_spec. If the variable is not present inds, it will be computed usingcount_variant_alleles().- merge :
bool(default:True) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds :
- Return type
- Returns
A dataset containing the following variables:
sgkit.variables.sample_n_called_spec(samples): The number of variants with called genotypes.sgkit.variables.sample_call_rate_spec(samples): The fraction of variants with called genotypes.sgkit.variables.sample_n_het_spec(samples): The number of variants with heterozygous calls.sgkit.variables.sample_n_hom_ref_spec(samples): The number of variants with homozygous reference calls.sgkit.variables.sample_n_hom_alt_spec(samples): The number of variants with homozygous alternate calls.sgkit.variables.sample_n_non_ref_spec(samples): The number of variants that are not homozygous reference calls.