sgkit.individual_heterozygosity#

sgkit.individual_heterozygosity(ds, *, call_allele_count='call_allele_count', merge=True)#

Compute per call individual heterozygosity.

Individual heterozygosity is the probability that two alleles drawn at random without replacement, from an individual at a given site, are not identical in state. Therefore, individual heterozygosity is defined for diploid and polyploid calls but will return nan in the case of haploid calls.

Parameters:
ds Dataset

Dataset containing genotype calls.

call_allele_count Hashable (default: 'call_allele_count')

Input variable name holding call_allele_count as defined by sgkit.variables.call_allele_count_spec. If the variable is not present in ds, it will be computed using count_call_alleles().

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.call_heterozygosity_spec of per genotype observed heterozygosity with shape (variants, samples) containing values within the interval [0, 1] or nan if ploidy < 2.

Examples

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=4, n_sample=2, seed=1)
>>> sg.display_genotypes(ds) 
samples    S0   S1
variants
0         1/0  1/0
1         1/0  1/1
2         0/1  1/0
3         0/0  0/0
>>> sg.individual_heterozygosity(ds)["call_heterozygosity"].values 
array([[1., 1.],
       [1., 0.],
       [1., 1.],
       [0., 0.]])