sgkit.identity_by_state#
- sgkit.identity_by_state(ds, *, call_allele_frequency='call_allele_frequency', skipna=True, merge=True)#
Compute identity by state (IBS) probabilities between all pairs of samples.
The IBS probability between a pair of individuals is the probability that a randomly drawn allele from the first individual is identical in state with a randomly drawn allele from the second individual at a single random locus.
- Parameters
- ds
Dataset
Dataset containing call genotype alleles.
- call_allele_frequency
Hashable
(default:'call_allele_frequency'
) Input variable name holding call_allele_frequency as defined by
sgkit.variables.call_allele_frequency_spec
. If the variable is not present inds
, it will be computed usingcall_allele_frequencies()
.- skipna
bool
(default:True
) If True (the default), missing (nan) allele frequencies will be skipped.
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type
- Returns
A dataset containing
sgkit.variables.stat_identity_by_state_spec
which is a matrix of pairwise IBS probabilities among all samples. The dimensions are namedsamples_0
andsamples_1
.
Examples
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=2, n_sample=3, seed=2) >>> sg.display_genotypes(ds) samples S0 S1 S2 variants 0 0/0 1/1 1/0 1 1/1 1/1 1/0 >>> sg.identity_by_state(ds)["stat_identity_by_state"].values array([[1. , 0.5, 0.5], [0.5, 1. , 0.5], [0.5, 0.5, 0.5]])