sgkit.individual_heterozygosity#

sgkit.individual_heterozygosity(ds, *, call_allele_count='call_allele_count', merge=True)#

Compute per call individual heterozygosity.

Individual heterozygosity is the probability that two alleles drawn at random without replacement, from an individual at a given site, are not identical in state. Therefore, individual heterozygosity is defined for diploid and polyploid calls but will return nan in the case of haploid calls.

Parameters
ds Dataset

Dataset containing genotype calls.

call_allele_count Hashable (default: 'call_allele_count')

Input variable name holding call_allele_count as defined by sgkit.variables.call_allele_count_spec. If the variable is not present in ds, it will be computed using count_call_alleles().

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type

Dataset

Returns

A dataset containing sgkit.variables.call_heterozygosity_spec of per genotype observed heterozygosity with shape (variants, samples) containing values within the interval [0, 1] or nan if ploidy < 2.

Examples

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=4, n_sample=2, seed=1)
>>> sg.display_genotypes(ds) 
samples    S0   S1
variants
0         1/0  1/0
1         1/0  1/1
2         0/1  1/0
3         0/0  0/0
>>> sg.individual_heterozygosity(ds)["call_heterozygosity"].values 
array([[1., 1.],
       [1., 0.],
       [1., 1.],
       [0., 0.]])