sgkit.count_variant_alleles#

sgkit.count_variant_alleles(ds, *, call_genotype='call_genotype', call_allele_count='call_allele_count', using='call_allele_count', merge=True)#

Compute allele count from per-sample allele counts, or genotype calls.

Parameters:
ds Dataset

Dataset containing genotype calls.

call_genotype Hashable (default: 'call_genotype')

Input variable name holding call_genotype as defined by sgkit.variables.call_genotype_spec. This variable is only used if specified by the ‘using’ argument.

call_allele_count Hashable (default: 'call_allele_count')

Input variable name holding call_allele_count as defined by sgkit.variables.call_allele_count_spec. This variable is only used if specified by the ‘using’ argument. If the variable is not present in ds, it will be computed using count_call_alleles().

using Literal['call_allele_count', 'call_genotype'] (default: 'call_allele_count')

specify the variable used to calculate allele counts from. If 'call_allele_count' (the default), the result will be calculated from the call_allele_count variable. If 'call_genotype', the result will be calculated from the call_genotype variable.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.variant_allele_count_spec of allele counts with shape (variants, alleles) and values corresponding to the number of non-missing occurrences of each allele.

Note

This method is more efficient when calculating allele counts directly from the call_genotype variable unless the call_allele_count variable has already been (or will be) calculated.

Examples

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=4, n_sample=2, seed=1)
>>> sg.display_genotypes(ds) 
samples    S0   S1
variants
0         1/0  1/0
1         1/0  1/1
2         0/1  1/0
3         0/0  0/0
>>> sg.count_variant_alleles(ds)["variant_allele_count"].values 
array([[2, 2],
       [1, 3],
       [2, 2],
       [4, 0]], dtype=uint64)