sgkit.count_variant_genotypes#
- sgkit.count_variant_genotypes(ds, *, call_genotype='call_genotype', genotype_id='genotype_id', assign_coords=True, merge=True)#
Count the number of calls of each possible genotype, at each variant.
The “possible genotypes” at a given variant locus include all possible combinations of the alleles at that locus, of size ploidy (i.e., all multisets of those alleles with cardinality <ploidy>).
- Parameters:
- ds
Dataset
Dataset containing genotype calls.
- call_genotype
Hashable
(default:'call_genotype'
) Input variable name holding call_genotype as defined by
sgkit.variables.call_genotype_spec
. Must be present inds
.- genotype_id
Hashable
(default:'genotype_id'
) Input variable name holding genotype ids as defined by
sgkit.variables.call_genotype_spec
. If this variable is not present in ds it will be automatically computed.- assign_coords
bool
(default:True
) If True (the default) then the genotype_id array will be assigned as the coordinates for the “genotypes” dimension in the returned dataset.
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.variant_genotype_count_spec
of genotype counts with shape (variants, genotypes). Refer to the variable documentation for examples of genotype ordering.
Warning
This method does not support mixed-ploidy datasets.
- Raises:
ValueError – If the dataset contains mixed-ploidy genotype calls.
Examples
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=4, n_sample=2, seed=1) >>> sg.display_genotypes(ds) samples S0 S1 variants 0 1/0 1/0 1 1/0 1/1 2 0/1 1/0 3 0/0 0/0
>>> sg.count_variant_genotypes(ds)["variant_genotype_count"].values array([[0, 2, 0], [0, 1, 1], [0, 2, 0], [2, 0, 0]], dtype=uint64)