sgkit.count_variant_genotypes#

sgkit.count_variant_genotypes(ds, *, call_genotype='call_genotype', genotype_id='genotype_id', assign_coords=True, merge=True)#

Count the number of calls of each possible genotype, at each variant.

The “possible genotypes” at a given variant locus include all possible combinations of the alleles at that locus, of size ploidy (i.e., all multisets of those alleles with cardinality <ploidy>).

Parameters:
ds Dataset

Dataset containing genotype calls.

call_genotype Hashable (default: 'call_genotype')

Input variable name holding call_genotype as defined by sgkit.variables.call_genotype_spec. Must be present in ds.

genotype_id Hashable (default: 'genotype_id')

Input variable name holding genotype ids as defined by sgkit.variables.call_genotype_spec. If this variable is not present in ds it will be automatically computed.

assign_coords bool (default: True)

If True (the default) then the genotype_id array will be assigned as the coordinates for the “genotypes” dimension in the returned dataset.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.variant_genotype_count_spec of genotype counts with shape (variants, genotypes). Refer to the variable documentation for examples of genotype ordering.

Warning

This method does not support mixed-ploidy datasets.

Raises:

ValueError – If the dataset contains mixed-ploidy genotype calls.

Examples

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=4, n_sample=2, seed=1)
>>> sg.display_genotypes(ds) 
samples    S0   S1
variants
0         1/0  1/0
1         1/0  1/1
2         0/1  1/0
3         0/0  0/0
>>> sg.count_variant_genotypes(ds)["variant_genotype_count"].values 
array([[0, 2, 0],
       [0, 1, 1],
       [0, 2, 0],
       [2, 0, 0]], dtype=uint64)