sgkit.identity_by_state#

sgkit.identity_by_state(ds, *, call_allele_frequency='call_allele_frequency', skipna=True, merge=True)#

Compute identity by state (IBS) probabilities between all pairs of samples.

The IBS probability between a pair of individuals is the probability that a randomly drawn allele from the first individual is identical in state with a randomly drawn allele from the second individual at a single random locus.

Parameters
ds : Dataset

Dataset containing call genotype alleles.

call_allele_frequency : Hashable (default: 'call_allele_frequency')

Input variable name holding call_allele_frequency as defined by sgkit.variables.call_allele_frequency_spec. If the variable is not present in ds, it will be computed using call_allele_frequencies().

skipna : bool (default: True)

If True (the default), missing (nan) allele frequencies will be skipped.

merge : bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type

Dataset

Returns

A dataset containing sgkit.variables.stat_identity_by_state_spec which is a matrix of pairwise IBS probabilities among all samples. The dimensions are named samples_0 and samples_1.

Examples

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=2, n_sample=3, seed=2)
>>> sg.display_genotypes(ds) 
samples    S0   S1   S2
variants
0         0/0  1/1  1/0
1         1/1  1/1  1/0
>>> sg.identity_by_state(ds)["stat_identity_by_state"].values 
array([[1. , 0.5, 0.5],
       [0.5, 1. , 0.5],
       [0.5, 0.5, 0.5]])