sgkit.pedigree_kinship#
- sgkit.pedigree_kinship(ds, *, method='diploid', parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', stat_Hamilton_Kerr_lambda='stat_Hamilton_Kerr_lambda', chunks=None, return_relationship=False, allow_half_founders=False, founder_kinship=None, founder_indices=None, merge=True)#
Estimate expected pairwise kinship coefficients from pedigree structure.
This method can optionally return the additive relationship matrix (ARM or A-matrix).
- Parameters:
- ds
Dataset
Dataset containing pedigree structure.
- method {‘diploid’, ‘Hamilton-Kerr’}
Literal
['diploid'
,'Hamilton-Kerr'
] (default:'diploid'
) The method used for kinship estimation. Defaults to “diploid” which is only suitable for pedigrees in which all samples are diploids resulting from sexual reproduction. The “Hamilton-Kerr” method is suitable for autopolyploid and mixed-ploidy datasets following Hamilton and Kerr 2017 [1].
- parent
Hashable
(default:'parent'
) Input variable name holding parents of each sample as defined by
sgkit.variables.parent_spec
. If the variable is not present inds
, it will be computed usingparent_indices()
.- stat_Hamilton_Kerr_tau
Hashable
(default:'stat_Hamilton_Kerr_tau'
) Input variable name holding stat_Hamilton_Kerr_tau as defined by
sgkit.variables.stat_Hamilton_Kerr_tau_spec
. This variable is only required for the “Hamilton-Kerr” method.- chunks
Hashable
|None
Optional
[Hashable
] (default:None
) Optionally specify chunks for the returned matrices.
- stat_Hamilton_Kerr_lambda
Hashable
(default:'stat_Hamilton_Kerr_lambda'
) Input variable name holding stat_Hamilton_Kerr_lambda as defined by
sgkit.variables.stat_Hamilton_Kerr_lambda_spec
. This variable is only required for the “Hamilton-Kerr” method.- return_relationship
bool
(default:False
) If True, the additive relationship matrix will be returned in addition to the kinship matrix.
- allow_half_founders
bool
(default:False
) If False (the default) then a ValueError will be raised if any individuals only have a single recorded parent. If True then the unrecorded parent will be assumed to be a unique founder unrelated to all other founders. If the Hamilton-Kerr method is used with half-founders then the tau and lambda parameters for gametes contributing to the unrecorded parent will be assumed to be equal to those of the gamete originating from that parent.
- founder_kinship
Hashable
|None
Optional
[Hashable
] (default:None
) Optionally specify an input kinship matrix as defined by
sgkit.variables.stat_genomic_kinship_spec
. Kinship estimates among founders within this matrix will be used to initialize the pedigree estimates as outlined by Goudet et al 2018 [2]. Kinship estimates for non-founders are ignored.- founder_indices
Hashable
|None
Optional
[Hashable
] (default:None
) Optionally specify an array of integer indices mapping rows/columns in a founder_kinship sub-matrix to sample positions in the samples dimension (i.e., the order of rows in the parent array). This variable must have the same length as founder_kinship.
Deprecated since version 0.7.0: Instead, use a ‘founder_kinship’ matrix with values for all pairs of samples (these may be nan values).
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.stat_pedigree_kinship_spec
and, if return_relationship is True,sgkit.variables.stat_pedigree_relationship_spec
.- Raises:
ValueError – If an unknown method is specified.
ValueError – If the pedigree contains a directed loop.
ValueError – If the diploid method is used with a non-diploid dataset.
ValueError – If the diploid method is used and the parents dimension does not have a length of two.
ValueError – If the Hamilton-Kerr method is used and a sample has more than two contributing parents.
ValueError – If the pedigree contains half-founders and allow_half_founders=False.
ValueError – If the dimension sizes of
founder_kinship
are not equal to the number of samples in the pedigree (whenfounder_indices
is not specified).ValueError – If the
founder_kinship
andfounder_indices
variables are both specified and have inconsistent shapes.
Note
This method is faster when a pedigree is sorted in topological order such that parents occur before their children.
Note
The diagonal values of
sgkit.variables.stat_pedigree_kinship_spec
are self-kinship estimates as opposed to inbreeding estimates.Note
Dimensions of
sgkit.variables.stat_pedigree_kinship_spec
andsgkit.variables.stat_pedigree_relationship_spec
are namedsamples_0
andsamples_1
.Note
Chunked kinship computation is implemented by identifying the sub-pedigree corresponding to each output chunk. An intermediate kinship matrix is then calculated which includes the chunk samples and their ancestors. This can be inefficient in deep pedigrees with many generations.
Note
If founder kinships are specified for a half-founder, then that individual will be treated as a full-founder by ignoring its known parent.
Note
The Hamilton-Kerr method may be applied to a dataset with more than two parent columns so long as each sample has two or fewer contributing parents as indicated by the
stat_Hamilton_Kerr_tau
variable. Within this variable, a contributing parent is indicated by a value greater than zero. Each sample must also have at least one (possibly unknown) contributing parent. Therefore, each row of thestat_Hamilton_Kerr_tau
variable must have either one or two non-zero values.Examples
Inbred diploid pedigree returning additive relationship matrix:
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... [".", "."], ... [".", "."], ... ["S0", "S1"], ... ["S0", "S2"] ... ] >>> ds = sg.pedigree_kinship(ds, return_relationship=True) >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0. , 0.25 , 0.375], [0. , 0.5 , 0.25 , 0.125], [0.25 , 0.25 , 0.5 , 0.375], [0.375, 0.125, 0.375, 0.625]]) >>> ds["stat_pedigree_relationship"].values array([[1. , 0. , 0.5 , 0.75], [0. , 1. , 0.5 , 0.25], [0.5 , 0.5 , 1. , 0.75], [0.75, 0.25, 0.75, 1.25]])
Inbred diploid pedigree with related founders:
>>> import sgkit as sg >>> from numpy import nan >>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... [".", "."], ... [".", "."], ... ["S0", "S1"], ... ["S0", "S2"] ... ] >>> # add "known" kinships among founders >>> ds["founder_kinship"] = ["samples_0", "samples_1"], [ ... [0.5, 0.1, nan, nan], ... [0.1, 0.6, nan, nan], ... [nan, nan, nan, nan], ... [nan, nan, nan, nan], ... ] >>> ds = sg.pedigree_kinship( ... ds, ... founder_kinship="founder_kinship", ... ) >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0.1 , 0.3 , 0.4 ], [0.1 , 0.6 , 0.35 , 0.225], [0.3 , 0.35 , 0.55 , 0.425], [0.4 , 0.225, 0.425, 0.65 ]])
Somatic doubling and unreduced gamete:
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... ['.', '.'], ... ['.', '.'], ... ['S0', 'S0'], # somatic doubling encoded as selfing ... ['S1', 'S2'], # diploid * tetraploid ... ] >>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [ ... [1, 1], ... [1, 1], ... [2, 2], # both 'gametes' are full genomic copies ... [2, 2], # unreduced gamete from diploid 'S1' ... ] >>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [ ... [0, 0], ... [0, 0], ... [0, 0], ... [0.1, 0], # increased probability of IBD in unreduced gamete ... ] >>> ds = sg.pedigree_kinship(ds, method="Hamilton-Kerr") >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0. , 0.5 , 0.25 ], [0. , 0.5 , 0. , 0.25 ], [0.5 , 0. , 0.5 , 0.25 ], [0.25 , 0.25 , 0.25 , 0.30416667]])
Somatic doubling and unreduced gamete using a third parent column to indicate clonal propagation:
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... ['.', '.', '.'], ... ['.', '.', '.'], ... ['.', '.', 'S0'], # somatic doubling encoded as clone ... ['S1', 'S2', '.'], # diploid * tetraploid ... ] >>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [ ... [1, 1, 0], ... [1, 1, 0], ... [0, 0, 4], # 4 homologues derived from diploid 'S0' ... [2, 2, 0], # unreduced gamete from diploid 'S1' ... ] >>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [ ... [0, 0, 0], ... [0, 0, 0], ... [0, 0, 1/3], # increased probability of IBD in somatic doubling ... [0.1, 0, 0], # increased probability of IBD in unreduced gamete ... ] >>> ds = sg.pedigree_kinship(ds, method="Hamilton-Kerr") >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0. , 0.5 , 0.25 ], [0. , 0.5 , 0. , 0.25 ], [0.5 , 0. , 0.5 , 0.25 ], [0.25 , 0.25 , 0.25 , 0.30416667]])
References
[1] - Matthew G. Hamilton and Richard J. Kerr 2017. “Computation of the inverse additive relationship matrix for autopolyploid and multiple-ploidy populations.” Theoretical and Applied Genetics 131: 851-860.
[2] - Jérôme Goudet, Tomas Kay and Bruce S. Weir 2018. “How to estimate kinship.” Molecular Ecology 27: 4121-4135.