sgkit.pedigree_kinship#
- sgkit.pedigree_kinship(ds, *, method='diploid', parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', stat_Hamilton_Kerr_lambda='stat_Hamilton_Kerr_lambda', return_relationship=False, allow_half_founders=False, founder_kinship=None, founder_indices=None, merge=True)#
Estimate expected pairwise kinship coefficients from pedigree structure.
This method can optionally return the additive relationship matrix (ARM or A-matrix).
- Parameters
- ds :
Dataset
Dataset containing pedigree structure.
- method : {‘diploid’, ‘Hamilton-Kerr’}
Literal
[‘diploid’, ‘Hamilton-Kerr’] (default:'diploid'
) The method used for kinship estimation. Defaults to “diploid” which is only suitable for pedigrees in which all samples are diploids resulting from sexual reproduction. The “Hamilton-Kerr” method is suitable for autopolyploid and mixed-ploidy datasets following Hamilton and Kerr 2017 [1].
- parent :
Hashable
(default:'parent'
) Input variable name holding parents of each sample as defined by
sgkit.variables.parent_spec
. If the variable is not present inds
, it will be computed usingparent_indices()
.- stat_Hamilton_Kerr_tau :
Hashable
(default:'stat_Hamilton_Kerr_tau'
) Input variable name holding stat_Hamilton_Kerr_tau as defined by
sgkit.variables.stat_Hamilton_Kerr_tau_spec
. This variable is only required for the “Hamilton-Kerr” method.- stat_Hamilton_Kerr_lambda :
Hashable
(default:'stat_Hamilton_Kerr_lambda'
) Input variable name holding stat_Hamilton_Kerr_lambda as defined by
sgkit.variables.stat_Hamilton_Kerr_lambda_spec
. This variable is only required for the “Hamilton-Kerr” method.- return_relationship :
bool
(default:False
) If True, the additive relationship matrix will be returned in addition to the kinship matrix.
- allow_half_founders :
bool
(default:False
) If False (the default) then a ValueError will be raised if any individuals only have a single recorded parent. If True then the unrecorded parent will be assumed to be a unique founder unrelated to all other founders. If the Hamilton-Kerr method is used with half-founders then the tau and lambda parameters for gametes contributing to the unrecorded parent will be assumed to be equal to those of the gamete originating from that parent.
- founder_kinship :
Hashable
|None
Optional
[Hashable
] (default:None
) Optionally specify a matrix of pairwise kinship estimates among founder samples which will be used to initialize pedigree estimates as outlined by Goudet et al. 2018 [2]. This variable must be a square matrix of shape (founders, founders) and must be used in conjunction with founder_indices.
- founder_indices :
Hashable
|None
Optional
[Hashable
] (default:None
) Optionally specify an array of integer indices mapping rows/columns in the founder_kinship matrix to sample positions in the samples dimension (i.e., the order of rows in the parent array). This variable must have the same length as founder_kinship.
- merge :
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.
- ds :
- Return type
- Returns
A dataset containing
sgkit.variables.stat_pedigree_kinship_spec
and, if return_relationship is True,sgkit.variables.stat_pedigree_relationship_spec
.- Raises
ValueError – If an unknown method is specified.
ValueError – If the pedigree contains a directed loop.
ValueError – If the diploid method is used with a non-diploid dataset.
ValueError – If the diploid method is used and the parents dimension does not have a length of two.
ValueError – If the Hamilton-Kerr method is used and a sample has more than two contributing parents.
ValueError – If the pedigree contains half-founders and allow_half_founders=False.
ValueError – If only one of the
founder_kinship
orfounder_indices
variables is specified.ValueError – If the
founder_kinship
orfounder_indices
variables have inconsistent shapes.ValueError – If a founder is missing from the
founder_indices
array or if a non-founder is indicated by this array.
Note
This method is faster when a pedigree is sorted in topological order such that parents occur before their children.
Note
The diagonal values of
sgkit.variables.stat_pedigree_kinship_spec
are self-kinship estimates as opposed to inbreeding estimates.Note
Dimensions of
sgkit.variables.stat_pedigree_kinship_spec
andsgkit.variables.stat_pedigree_relationship_spec
are namedsamples_0
andsamples_1
.Note
If founder kinships are specified for a half-founder, then that individual will be treated as a full-founder by ignoring its known parent.
Note
The Hamilton-Kerr method may be applied to a dataset with more than two parent columns so long as each sample has two or fewer contributing parents as indicated by the
stat_Hamilton_Kerr_tau
variable. Within this variable, a contributing parent is indicated by a value greater than zero. Each sample must also have at least one (possibly unknown) contributing parent. Therefore, each row of thestat_Hamilton_Kerr_tau
variable must have either one or two non-zero values.Examples
Inbred diploid pedigree returning additive relationship matrix:
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... [".", "."], ... [".", "."], ... ["S0", "S1"], ... ["S0", "S2"] ... ] >>> ds = sg.pedigree_kinship(ds, return_relationship=True) >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0. , 0.25 , 0.375], [0. , 0.5 , 0.25 , 0.125], [0.25 , 0.25 , 0.5 , 0.375], [0.375, 0.125, 0.375, 0.625]]) >>> ds["stat_pedigree_relationship"].values array([[1. , 0. , 0.5 , 0.75], [0. , 1. , 0.5 , 0.25], [0.5 , 0.5 , 1. , 0.75], [0.75, 0.25, 0.75, 1.25]])
Inbred diploid pedigree with related founders:
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... [".", "."], ... [".", "."], ... ["S0", "S1"], ... ["S0", "S2"] ... ] >>> # add "known" kinships among founders >>> ds["founder_kinship"] = ["founders_0", "founders_1"], [ ... [0.5, 0.1], ... [0.1, 0.6], ... ] >>> # founder kinships correspond to the first two samples >>> ds["founder_indices"] = ["founders"], [0, 1] >>> ds = sg.pedigree_kinship( ... ds, ... founder_kinship="founder_kinship", ... founder_indices="founder_indices", ... ) >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0.1 , 0.3 , 0.4 ], [0.1 , 0.6 , 0.35 , 0.225], [0.3 , 0.35 , 0.55 , 0.425], [0.4 , 0.225, 0.425, 0.65 ]])
Somatic doubling and unreduced gamete:
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... ['.', '.'], ... ['.', '.'], ... ['S0', 'S0'], # somatic doubling encoded as selfing ... ['S1', 'S2'], # diploid * tetraploid ... ] >>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [ ... [1, 1], ... [1, 1], ... [2, 2], # both 'gametes' are full genomic copies ... [2, 2], # unreduced gamete from diploid 'S1' ... ] >>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [ ... [0, 0], ... [0, 0], ... [0, 0], ... [0.1, 0], # increased probability of IBD in unreduced gamete ... ] >>> ds = sg.pedigree_kinship(ds, method="Hamilton-Kerr") >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0. , 0.5 , 0.25 ], [0. , 0.5 , 0. , 0.25 ], [0.5 , 0. , 0.5 , 0.25 ], [0.25 , 0.25 , 0.25 , 0.30416667]])
Somatic doubling and unreduced gamete using a third parent column to indicate clonal propagation:
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... ['.', '.', '.'], ... ['.', '.', '.'], ... ['.', '.', 'S0'], # somatic doubling encoded as clone ... ['S1', 'S2', '.'], # diploid * tetraploid ... ] >>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [ ... [1, 1, 0], ... [1, 1, 0], ... [0, 0, 4], # 4 homologues derived from diploid 'S0' ... [2, 2, 0], # unreduced gamete from diploid 'S1' ... ] >>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [ ... [0, 0, 0], ... [0, 0, 0], ... [0, 0, 1/3], # increased probability of IBD in somatic doubling ... [0.1, 0, 0], # increased probability of IBD in unreduced gamete ... ] >>> ds = sg.pedigree_kinship(ds, method="Hamilton-Kerr") >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0. , 0.5 , 0.25 ], [0. , 0.5 , 0. , 0.25 ], [0.5 , 0. , 0.5 , 0.25 ], [0.25 , 0.25 , 0.25 , 0.30416667]])
References
[1] - Matthew G. Hamilton and Richard J. Kerr 2017. “Computation of the inverse additive relationship matrix for autopolyploid and multiple-ploidy populations.” Theoretical and Applied Genetics 131: 851-860.
[2] - Jérôme Goudet, Tomas Kay and Bruce S. Weir 2018. “How to estimate kinship.” Molecular Ecology 27: 4121-4135.