sgkit.count_variant_alleles#
- sgkit.count_variant_alleles(ds, *, call_genotype='call_genotype', call_allele_count='call_allele_count', using='call_allele_count', merge=True)#
Compute allele count from per-sample allele counts, or genotype calls.
- Parameters:
- ds
Dataset
Dataset containing genotype calls.
- call_genotype
Hashable
(default:'call_genotype'
) Input variable name holding call_genotype as defined by
sgkit.variables.call_genotype_spec
. This variable is only used if specified by the ‘using’ argument.- call_allele_count
Hashable
(default:'call_allele_count'
) Input variable name holding call_allele_count as defined by
sgkit.variables.call_allele_count_spec
. This variable is only used if specified by the ‘using’ argument. If the variable is not present inds
, it will be computed usingcount_call_alleles()
.- using {‘call_allele_count’, ‘call_genotype’}
Literal
['call_allele_count'
,'call_genotype'
] (default:'call_allele_count'
) specify the variable used to calculate allele counts from. If
'call_allele_count'
(the default), the result will be calculated from the call_allele_count variable. If'call_genotype'
, the result will be calculated from the call_genotype variable.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.variant_allele_count_spec
of allele counts with shape (variants, alleles) and values corresponding to the number of non-missing occurrences of each allele.
Note
This method is more efficient when calculating allele counts directly from the call_genotype variable unless the call_allele_count variable has already been (or will be) calculated.
Examples
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=4, n_sample=2, seed=1) >>> sg.display_genotypes(ds) samples S0 S1 variants 0 1/0 1/0 1 1/0 1/1 2 0/1 1/0 3 0/0 0/0
>>> sg.count_variant_alleles(ds)["variant_allele_count"].values array([[2, 2], [1, 3], [2, 2], [4, 0]], dtype=uint64)