sgkit.pedigree_inverse_kinship#

sgkit.pedigree_inverse_kinship(ds, *, method='diploid', parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', stat_Hamilton_Kerr_lambda='stat_Hamilton_Kerr_lambda', return_relationship=False, allow_half_founders=False, merge=True)#

Calculate the inverse of the kinship matrix from pedigree structure.

This method can optionally return the inverse of the additive relationship matrix (ARM or A-matrix).

Parameters:
ds Dataset

Dataset containing pedigree structure.

method Literal['diploid', 'Hamilton-Kerr'] (default: 'diploid')

The method used for kinship estimation. Defaults to “diploid” which is only suitable for pedigrees in which all samples are diploids resulting from sexual reproduction. The “Hamilton-Kerr” method is suitable for autopolyploid and mixed-ploidy datasets following Hamilton and Kerr 2017 [1].

parent Hashable (default: 'parent')

Input variable name holding parents of each sample as defined by sgkit.variables.parent_spec. If the variable is not present in ds, it will be computed using parent_indices().

stat_Hamilton_Kerr_tau Hashable (default: 'stat_Hamilton_Kerr_tau')

Input variable name holding stat_Hamilton_Kerr_tau as defined by sgkit.variables.stat_Hamilton_Kerr_tau_spec. This variable is only required for the “Hamilton-Kerr” method.

stat_Hamilton_Kerr_lambda Hashable (default: 'stat_Hamilton_Kerr_lambda')

Input variable name holding stat_Hamilton_Kerr_lambda as defined by sgkit.variables.stat_Hamilton_Kerr_lambda_spec. This variable is only required for the “Hamilton-Kerr” method.

return_relationship bool (default: False)

If True, the inverse of the additive relationship matrix will be returned in addition to the inverse of the kinship matrix.

allow_half_founders bool (default: False)

If False (the default) then a ValueError will be raised if any individuals only have a single recorded parent. If True then the unrecorded parent will be assumed to be a unique founder unrelated to all other founders. If the Hamilton-Kerr method is used with half-founders then the tau and lambda parameters for gametes contributing to the unrecorded parent will be assumed to be equal to those of the gamete originating from that parent.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.stat_pedigree_inverse_kinship_spec. and, if return_relationship is True, sgkit.variables.stat_pedigree_inverse_relationship_spec.

Raises:
  • ValueError – If an unknown method is specified.

  • ValueError – If the (intermediate) kinship matrix is singular.

  • ValueError – If the diploid method is used with a non-diploid dataset.

  • ValueError – If the diploid method is used and the parents dimension does not have a length of two.

  • ValueError – If the Hamilton-Kerr method is used and a sample has more than two contributing parents.

  • ValueError – If the pedigree contains half-founders and allow_half_founders=False.

Note

The Hamilton-Kerr method may be applied to a dataset with more than two parent columns so long as each sample has two or fewer contributing parents as indicated by the stat_Hamilton_Kerr_tau variable. Within this variable, a contributing parent is indicated by a value greater than zero. Each sample must also have at least one (possibly unknown) contributing parent. Therefore, each row of the stat_Hamilton_Kerr_tau variable must have either one or two non-zero values.

Examples

Inbred diploid pedigree returning inverse additive relationship matrix:

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     [".", "."],
...     [".", "."],
...     ["S0", "S1"],
...     ["S0", "S2"]
... ]
>>> ds = sg.pedigree_inverse_kinship(ds, return_relationship=True)
>>> ds["stat_pedigree_inverse_kinship"].values 
array([[ 4.,  1., -1., -2.],
       [ 1.,  3., -2.,  0.],
       [-1., -2.,  5., -2.],
       [-2.,  0., -2.,  4.]])
>>> ds["stat_pedigree_inverse_relationship"].values 
array([[ 2. ,  0.5, -0.5, -1. ],
       [ 0.5,  1.5, -1. ,  0. ],
       [-0.5, -1. ,  2.5, -1. ],
       [-1. ,  0. , -1. ,  2. ]])

Unreduced gamete and half-clone:

>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     ['.', '.'],
...     ['.', '.'],
...     ['S0','S1'],  # diploid * tetraploid
...     ['S2', '.'],  # half-clone of 'S2'
... ]
>>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [
...     [1, 1],
...     [2, 2],
...     [2, 2],  # unreduced gamete from diploid 'S0'
...     [2, 0],  # contribution from 'S2' only
... ]
>>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [
...     [0, 0],
...     [0, 0],
...     [0.1, 0],  # increased probability of IBD in unreduced gamete
...     [0, 0],
... ]
>>> ds = sg.pedigree_inverse_kinship(ds, method="Hamilton-Kerr")
>>> ds["stat_pedigree_inverse_kinship"].values  
array([[ 5.33333333,  3.33333333, -6.66666667,  0.        ],
       [ 3.33333333,  7.33333333, -6.66666667,  0.        ],
       [-6.66666667, -6.66666667, 17.40112994, -4.06779661],
       [ 0.        ,  0.        , -4.06779661,  4.06779661]])

Unreduced gamete and half-clone using a third parent column to indicate clonal propagation:

>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     ['.', '.', '.'],
...     ['.', '.', '.'],
...     ['S0', 'S1', '.'],  # diploid * tetraploid
...     ['.', '.', 'S2'],  # half-clone of 'S2'
... ]
>>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [
...     [1, 1, 0],
...     [2, 2, 0],
...     [2, 2, 0],  # unreduced gamete from diploid 'S0'
...     [0, 0, 2],  # contribution from 'S2' only
... ]
>>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [
...     [0, 0, 0],
...     [0, 0, 0],
...     [0.1, 0, 0],  # increased probability of IBD in unreduced gamete
...     [0, 0, 0],
... ]
>>> ds = sg.pedigree_inverse_kinship(ds, method="Hamilton-Kerr")
>>> ds["stat_pedigree_inverse_kinship"].values  
array([[ 5.33333333,  3.33333333, -6.66666667,  0.        ],
       [ 3.33333333,  7.33333333, -6.66666667,  0.        ],
       [-6.66666667, -6.66666667, 17.40112994, -4.06779661],
       [ 0.        ,  0.        , -4.06779661,  4.06779661]])

References

[1] - Matthew G. Hamilton, and Richard J. Kerr 2017. “Computation of the inverse additive relationship matrix for autopolyploid and multiple-ploidy populations.” Theoretical and Applied Genetics 131: 851-860.