sgkit.pedigree_inverse_kinship#
- sgkit.pedigree_inverse_kinship(ds, *, method='diploid', parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', stat_Hamilton_Kerr_lambda='stat_Hamilton_Kerr_lambda', return_relationship=False, allow_half_founders=False, merge=True)#
Calculate the inverse of the kinship matrix from pedigree structure.
This method can optionally return the inverse of the additive relationship matrix (ARM or A-matrix).
- Parameters:
- ds
Dataset
Dataset containing pedigree structure.
- method {‘diploid’, ‘Hamilton-Kerr’}
Literal
['diploid'
,'Hamilton-Kerr'
] (default:'diploid'
) The method used for kinship estimation. Defaults to “diploid” which is only suitable for pedigrees in which all samples are diploids resulting from sexual reproduction. The “Hamilton-Kerr” method is suitable for autopolyploid and mixed-ploidy datasets following Hamilton and Kerr 2017 [1].
- parent
Hashable
(default:'parent'
) Input variable name holding parents of each sample as defined by
sgkit.variables.parent_spec
. If the variable is not present inds
, it will be computed usingparent_indices()
.- stat_Hamilton_Kerr_tau
Hashable
(default:'stat_Hamilton_Kerr_tau'
) Input variable name holding stat_Hamilton_Kerr_tau as defined by
sgkit.variables.stat_Hamilton_Kerr_tau_spec
. This variable is only required for the “Hamilton-Kerr” method.- stat_Hamilton_Kerr_lambda
Hashable
(default:'stat_Hamilton_Kerr_lambda'
) Input variable name holding stat_Hamilton_Kerr_lambda as defined by
sgkit.variables.stat_Hamilton_Kerr_lambda_spec
. This variable is only required for the “Hamilton-Kerr” method.- return_relationship
bool
(default:False
) If True, the inverse of the additive relationship matrix will be returned in addition to the inverse of the kinship matrix.
- allow_half_founders
bool
(default:False
) If False (the default) then a ValueError will be raised if any individuals only have a single recorded parent. If True then the unrecorded parent will be assumed to be a unique founder unrelated to all other founders. If the Hamilton-Kerr method is used with half-founders then the tau and lambda parameters for gametes contributing to the unrecorded parent will be assumed to be equal to those of the gamete originating from that parent.
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.stat_pedigree_inverse_kinship_spec
. and, if return_relationship is True,sgkit.variables.stat_pedigree_inverse_relationship_spec
.- Raises:
ValueError – If an unknown method is specified.
ValueError – If the (intermediate) kinship matrix is singular.
ValueError – If the diploid method is used with a non-diploid dataset.
ValueError – If the diploid method is used and the parents dimension does not have a length of two.
ValueError – If the Hamilton-Kerr method is used and a sample has more than two contributing parents.
ValueError – If the pedigree contains half-founders and allow_half_founders=False.
Note
Dimensions of
sgkit.variables.stat_pedigree_inverse_kinship_spec
andsgkit.variables.stat_pedigree_inverse_relationship_spec
are namedsamples_0
andsamples_1
.Note
The Hamilton-Kerr method may be applied to a dataset with more than two parent columns so long as each sample has two or fewer contributing parents as indicated by the
stat_Hamilton_Kerr_tau
variable. Within this variable, a contributing parent is indicated by a value greater than zero. Each sample must also have at least one (possibly unknown) contributing parent. Therefore, each row of thestat_Hamilton_Kerr_tau
variable must have either one or two non-zero values.Examples
Inbred diploid pedigree returning inverse additive relationship matrix:
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... [".", "."], ... [".", "."], ... ["S0", "S1"], ... ["S0", "S2"] ... ] >>> ds = sg.pedigree_inverse_kinship(ds, return_relationship=True) >>> ds["stat_pedigree_inverse_kinship"].values array([[ 4., 1., -1., -2.], [ 1., 3., -2., 0.], [-1., -2., 5., -2.], [-2., 0., -2., 4.]]) >>> ds["stat_pedigree_inverse_relationship"].values array([[ 2. , 0.5, -0.5, -1. ], [ 0.5, 1.5, -1. , 0. ], [-0.5, -1. , 2.5, -1. ], [-1. , 0. , -1. , 2. ]])
Unreduced gamete and half-clone:
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... ['.', '.'], ... ['.', '.'], ... ['S0','S1'], # diploid * tetraploid ... ['S2', '.'], # half-clone of 'S2' ... ] >>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [ ... [1, 1], ... [2, 2], ... [2, 2], # unreduced gamete from diploid 'S0' ... [2, 0], # contribution from 'S2' only ... ] >>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [ ... [0, 0], ... [0, 0], ... [0.1, 0], # increased probability of IBD in unreduced gamete ... [0, 0], ... ] >>> ds = sg.pedigree_inverse_kinship(ds, method="Hamilton-Kerr") >>> ds["stat_pedigree_inverse_kinship"].values array([[ 5.33333333, 3.33333333, -6.66666667, 0. ], [ 3.33333333, 7.33333333, -6.66666667, 0. ], [-6.66666667, -6.66666667, 17.40112994, -4.06779661], [ 0. , 0. , -4.06779661, 4.06779661]])
Unreduced gamete and half-clone using a third parent column to indicate clonal propagation:
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... ['.', '.', '.'], ... ['.', '.', '.'], ... ['S0', 'S1', '.'], # diploid * tetraploid ... ['.', '.', 'S2'], # half-clone of 'S2' ... ] >>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [ ... [1, 1, 0], ... [2, 2, 0], ... [2, 2, 0], # unreduced gamete from diploid 'S0' ... [0, 0, 2], # contribution from 'S2' only ... ] >>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [ ... [0, 0, 0], ... [0, 0, 0], ... [0.1, 0, 0], # increased probability of IBD in unreduced gamete ... [0, 0, 0], ... ] >>> ds = sg.pedigree_inverse_kinship(ds, method="Hamilton-Kerr") >>> ds["stat_pedigree_inverse_kinship"].values array([[ 5.33333333, 3.33333333, -6.66666667, 0. ], [ 3.33333333, 7.33333333, -6.66666667, 0. ], [-6.66666667, -6.66666667, 17.40112994, -4.06779661], [ 0. , 0. , -4.06779661, 4.06779661]])
References
[1] - Matthew G. Hamilton, and Richard J. Kerr 2017. “Computation of the inverse additive relationship matrix for autopolyploid and multiple-ploidy populations.” Theoretical and Applied Genetics 131: 851-860.