sgkit.pedigree_inverse_kinship#

sgkit.pedigree_inverse_kinship(ds, *, method='diploid', parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', stat_Hamilton_Kerr_lambda='stat_Hamilton_Kerr_lambda', return_relationship=False, allow_half_founders=False, merge=True)#

Calculate the inverse of the kinship matrix from pedigree structure.

This method can optionally return the inverse of the additive relationship matrix (ARM or A-matrix).

Parameters

ds : Dataset: Dataset containing pedigree structure.
method : {‘diploid’, ‘Hamilton-Kerr’}Literal[‘diploid’, ‘Hamilton-Kerr’] (default: 'diploid'): The method used for kinship estimation. Defaults to “diploid” which is only suitable for pedigrees in which all samples are diploids resulting from sexual reproduction. The “Hamilton-Kerr” method is suitable for autopolyploid and mixed-ploidy datasets following Hamilton and Kerr 2017 [1].
parent : Hashable (default: 'parent'): Input variable name holding parents of each sample as defined by sgkit.variables.parent_spec. If the variable is not present in ds, it will be computed using parent_indices().
stat_Hamilton_Kerr_tau : Hashable (default: 'stat_Hamilton_Kerr_tau'): Input variable name holding stat_Hamilton_Kerr_tau as defined by sgkit.variables.stat_Hamilton_Kerr_tau_spec. This variable is only required for the “Hamilton-Kerr” method.
stat_Hamilton_Kerr_lambda : Hashable (default: 'stat_Hamilton_Kerr_lambda'): Input variable name holding stat_Hamilton_Kerr_lambda as defined by sgkit.variables.stat_Hamilton_Kerr_lambda_spec. This variable is only required for the “Hamilton-Kerr” method.
return_relationship : bool (default: False): If True, the inverse of the additive relationship matrix will be returned in addition to the inverse of the kinship matrix.
allow_half_founders : bool (default: False): If False (the default) then a ValueError will be raised if any individuals only have a single recorded parent. If True then the unrecorded parent will be assumed to be a unique founder unrelated to all other founders. If the Hamilton-Kerr method is used with half-founders then the tau and lambda parameters for gametes contributing to the unrecorded parent will be assumed to be equal to those of the gamete originating from that parent.
merge : bool (default: True): If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.

Return type

Dataset

Returns

A dataset containing sgkit.variables.stat_pedigree_inverse_kinship_spec. and, if return_relationship is True, sgkit.variables.stat_pedigree_inverse_relationship_spec.

Raises

ValueError – If an unknown method is specified.
ValueError – If the (intermediate) kinship matrix is singular.
ValueError – If the diploid method is used with a non-diploid dataset.
ValueError – If the diploid method is used and the parents dimension does not have a length of two.
ValueError – If the Hamilton-Kerr method is used and a sample has more than two contributing parents.
ValueError – If the pedigree contains half-founders and allow_half_founders=False.

Note

Dimensions of sgkit.variables.stat_pedigree_inverse_kinship_spec and sgkit.variables.stat_pedigree_inverse_relationship_spec are named samples_0 and samples_1.

Note

The Hamilton-Kerr method may be applied to a dataset with more than two parent columns so long as each sample has two or fewer contributing parents as indicated by the stat_Hamilton_Kerr_tau variable. Within this variable, a contributing parent is indicated by a value greater than zero. Each sample must also have at least one (possibly unknown) contributing parent. Therefore, each row of the stat_Hamilton_Kerr_tau variable must have either one or two non-zero values.

Examples

Inbred diploid pedigree returning inverse additive relationship matrix:

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     [".", "."],
...     [".", "."],
...     ["S0", "S1"],
...     ["S0", "S2"]
... ]
>>> ds = sg.pedigree_inverse_kinship(ds, return_relationship=True)
>>> ds["stat_pedigree_inverse_kinship"].values 
array([[ 4.,  1., -1., -2.],
       [ 1.,  3., -2.,  0.],
       [-1., -2.,  5., -2.],
       [-2.,  0., -2.,  4.]])
>>> ds["stat_pedigree_inverse_relationship"].values 
array([[ 2. ,  0.5, -0.5, -1. ],
       [ 0.5,  1.5, -1. ,  0. ],
       [-0.5, -1. ,  2.5, -1. ],
       [-1. ,  0. , -1. ,  2. ]])

Unreduced gamete and half-clone:

>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     ['.', '.'],
...     ['.', '.'],
...     ['S0','S1'],  # diploid * tetraploid
...     ['S2', '.'],  # half-clone of 'S2'
... ]
>>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [
...     [1, 1],
...     [2, 2],
...     [2, 2],  # unreduced gamete from diploid 'S0'
...     [2, 0],  # contribution from 'S2' only
... ]
>>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [
...     [0, 0],
...     [0, 0],
...     [0.1, 0],  # increased probability of IBD in unreduced gamete
...     [0, 0],
... ]
>>> ds = sg.pedigree_inverse_kinship(ds, method="Hamilton-Kerr")
>>> ds["stat_pedigree_inverse_kinship"].values  
array([[ 5.33333333,  3.33333333, -6.66666667,  0.        ],
       [ 3.33333333,  7.33333333, -6.66666667,  0.        ],
       [-6.66666667, -6.66666667, 17.40112994, -4.06779661],
       [ 0.        ,  0.        , -4.06779661,  4.06779661]])

Unreduced gamete and half-clone using a third parent column to indicate clonal propagation:

>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     ['.', '.', '.'],
...     ['.', '.', '.'],
...     ['S0', 'S1', '.'],  # diploid * tetraploid
...     ['.', '.', 'S2'],  # half-clone of 'S2'
... ]
>>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [
...     [1, 1, 0],
...     [2, 2, 0],
...     [2, 2, 0],  # unreduced gamete from diploid 'S0'
...     [0, 0, 2],  # contribution from 'S2' only
... ]
>>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [
...     [0, 0, 0],
...     [0, 0, 0],
...     [0.1, 0, 0],  # increased probability of IBD in unreduced gamete
...     [0, 0, 0],
... ]
>>> ds = sg.pedigree_inverse_kinship(ds, method="Hamilton-Kerr")
>>> ds["stat_pedigree_inverse_kinship"].values  
array([[ 5.33333333,  3.33333333, -6.66666667,  0.        ],
       [ 3.33333333,  7.33333333, -6.66666667,  0.        ],
       [-6.66666667, -6.66666667, 17.40112994, -4.06779661],
       [ 0.        ,  0.        , -4.06779661,  4.06779661]])

References

[1] - Matthew G. Hamilton, and Richard J. Kerr 2017. “Computation of the inverse additive relationship matrix for autopolyploid and multiple-ploidy populations.” Theoretical and Applied Genetics 131: 851-860.