sgkit.Weir_Goudet_beta#

sgkit.Weir_Goudet_beta(ds, *, stat_identity_by_state='stat_identity_by_state', merge=True)#

Estimate pairwise beta between all pairs of samples as described in Weir and Goudet 2017 [1].

Beta is the kinship scaled by the average kinship of all pairs of individuals in the dataset such that the non-diagonal (non-self) values sum to zero.

Beta may be corrected to more accurately reflect pedigree based kinship estimates using the formula \(\hat{\beta}^c=\frac{\hat{\beta}-\hat{\beta}_0}{1-\hat{\beta}_0}\) where \(\hat{\beta}_0\) is the estimated beta between samples which are known to be unrelated [1].

Parameters:
ds Dataset

Genotype call dataset.

stat_identity_by_state Hashable (default: 'stat_identity_by_state')

Input variable name holding stat_identity_by_state as defined by sgkit.variables.stat_identity_by_state_spec. If the variable is not present in ds, it will be computed using identity_by_state().

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.stat_Weir_Goudet_beta_spec which is a matrix of estimated pairwise kinship relative to the average kinship of all pairs of individuals in the dataset. The dimensions are named samples_0 and samples_1.

Examples

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=3, n_sample=3, n_allele=10, seed=3)
>>> # sample 2 "inherits" alleles from samples 0 and 1
>>> ds.call_genotype.data[:, 2, 0] = ds.call_genotype.data[:, 0, 0]
>>> ds.call_genotype.data[:, 2, 1] = ds.call_genotype.data[:, 1, 0]
>>> sg.display_genotypes(ds) 
samples    S0   S1   S2
variants
0         7/1  8/6  7/8
1         9/5  3/6  9/3
2         8/8  8/3  8/8
>>> # estimate beta
>>> ds = sg.Weir_Goudet_beta(ds).compute()
>>> ds.stat_Weir_Goudet_beta.values 
array([[ 0.5 , -0.25,  0.25],
       [-0.25,  0.25,  0.  ],
       [ 0.25,  0.  ,  0.5 ]])
>>> # correct beta assuming least related samples are unrelated
>>> beta = ds.stat_Weir_Goudet_beta
>>> beta0 = beta.min()
>>> beta_corrected = (beta - beta0) / (1 - beta0)
>>> beta_corrected.values 
array([[0.6, 0. , 0.4],
       [0. , 0.4, 0.2],
       [0.4, 0.2, 0.6]])

References

[1] - Bruce, S. Weir, and Jérôme Goudet 2017. “A Unified Characterization of Population Structure and Relatedness.” Genetics 206 (4): 2085-2103.