sgkit.convert_call_to_index#

sgkit.convert_call_to_index(ds, *, call_genotype='call_genotype', merge=True)#

Convert each call genotype to a single integer value.

Parameters:
ds Dataset

Dataset containing genotype calls.

call_genotype Hashable (default: 'call_genotype')

Input variable name holding call_genotype as defined by sgkit.variables.call_genotype_spec. Must be present in ds.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.call_genotype_index_spec and sgkit.variables.call_genotype_index_mask_spec. Genotype calls with missing alleles will result in an index of -1.

Warning

This method does not support mixed-ploidy datasets.

Raises:

ValueError – If the dataset contains mixed-ploidy genotype calls.

Examples

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(
...     n_variant=4,
...     n_sample=2,
...     missing_pct=0.05,
...     seed=1,
... )
>>> sg.display_genotypes(ds) 
samples    S0   S1
variants
0         ./0  1/0
1         1/0  1/1
2         0/1  1/0
3         ./0  0/0
>>> sg.convert_call_to_index(ds)["call_genotype_index"].values 
array([[-1,  1],
       [ 1,  2],
       [ 1,  1],
       [-1,  0]]...)
>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(
...     n_variant=4,
...     n_sample=2,
...     n_allele=10,
...     missing_pct=0.05,
...     seed=1,
... )
>>> sg.display_genotypes(ds) 
samples    S0   S1
variants
0         5/4  1/0
1         7/7  8/8
2         4/7  ./9
3         3/0  5/5
>>> sg.convert_call_to_index(ds)["call_genotype_index"].values 
array([[19,  1],
       [35, 44],
       [32, -1],
       [ 6, 20]]...)