sgkit.pedigree_kinship#
- sgkit.pedigree_kinship(ds, *, method='diploid', parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', stat_Hamilton_Kerr_lambda='stat_Hamilton_Kerr_lambda', return_relationship=False, allow_half_founders=False, founder_kinship=None, founder_indices=None, merge=True)#
Estimate expected pairwise kinship coefficients from pedigree structure.
This method can optionally return the additive relationship matrix (ARM or A-matrix).
- Parameters:
- ds
Dataset
Dataset containing pedigree structure.
- method {‘diploid’, ‘Hamilton-Kerr’}
Literal
['diploid'
,'Hamilton-Kerr'
] (default:'diploid'
) The method used for kinship estimation. Defaults to “diploid” which is only suitable for pedigrees in which all samples are diploids resulting from sexual reproduction. The “Hamilton-Kerr” method is suitable for autopolyploid and mixed-ploidy datasets following Hamilton and Kerr 2017 [1].
- parent
Hashable
(default:'parent'
) Input variable name holding parents of each sample as defined by
sgkit.variables.parent_spec
. If the variable is not present inds
, it will be computed usingparent_indices()
.- stat_Hamilton_Kerr_tau
Hashable
(default:'stat_Hamilton_Kerr_tau'
) Input variable name holding stat_Hamilton_Kerr_tau as defined by
sgkit.variables.stat_Hamilton_Kerr_tau_spec
. This variable is only required for the “Hamilton-Kerr” method.- stat_Hamilton_Kerr_lambda
Hashable
(default:'stat_Hamilton_Kerr_lambda'
) Input variable name holding stat_Hamilton_Kerr_lambda as defined by
sgkit.variables.stat_Hamilton_Kerr_lambda_spec
. This variable is only required for the “Hamilton-Kerr” method.- return_relationship
bool
(default:False
) If True, the additive relationship matrix will be returned in addition to the kinship matrix.
- allow_half_founders
bool
(default:False
) If False (the default) then a ValueError will be raised if any individuals only have a single recorded parent. If True then the unrecorded parent will be assumed to be a unique founder unrelated to all other founders. If the Hamilton-Kerr method is used with half-founders then the tau and lambda parameters for gametes contributing to the unrecorded parent will be assumed to be equal to those of the gamete originating from that parent.
- founder_kinship
Hashable
|None
Optional
[Hashable
] (default:None
) Optionally specify an input kinship matrix as defined by
sgkit.variables.stat_genomic_kinship_spec
. Kinship estimates among founders within this matrix will be used to initialize the pedigree estimates as outlined by Goudet et al 2018 [2]. Kinship estimates for non-founders are ignored.- founder_indices
Hashable
|None
Optional
[Hashable
] (default:None
) Optionally specify an array of integer indices mapping rows/columns in a founder_kinship sub-matrix to sample positions in the samples dimension (i.e., the order of rows in the parent array). This variable must have the same length as founder_kinship.
Deprecated since version 0.7.0: Instead, use a ‘founder_kinship’ matrix with values for all pairs of samples (these may be nan values).
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.stat_pedigree_kinship_spec
and, if return_relationship is True,sgkit.variables.stat_pedigree_relationship_spec
.- Raises:
ValueError – If an unknown method is specified.
ValueError – If the pedigree contains a directed loop.
ValueError – If the diploid method is used with a non-diploid dataset.
ValueError – If the diploid method is used and the parents dimension does not have a length of two.
ValueError – If the Hamilton-Kerr method is used and a sample has more than two contributing parents.
ValueError – If the pedigree contains half-founders and allow_half_founders=False.
ValueError – If the dimension sizes of
founder_kinship
are not equal to the number of samples in the pedigree (whenfounder_indices
is not specified).ValueError – If the
founder_kinship
andfounder_indices
variables are both specified and have inconsistent shapes.
Note
This method is faster when a pedigree is sorted in topological order such that parents occur before their children.
Note
The diagonal values of
sgkit.variables.stat_pedigree_kinship_spec
are self-kinship estimates as opposed to inbreeding estimates.Note
Dimensions of
sgkit.variables.stat_pedigree_kinship_spec
andsgkit.variables.stat_pedigree_relationship_spec
are namedsamples_0
andsamples_1
.Note
If founder kinships are specified for a half-founder, then that individual will be treated as a full-founder by ignoring its known parent.
Note
The Hamilton-Kerr method may be applied to a dataset with more than two parent columns so long as each sample has two or fewer contributing parents as indicated by the
stat_Hamilton_Kerr_tau
variable. Within this variable, a contributing parent is indicated by a value greater than zero. Each sample must also have at least one (possibly unknown) contributing parent. Therefore, each row of thestat_Hamilton_Kerr_tau
variable must have either one or two non-zero values.Examples
Inbred diploid pedigree returning additive relationship matrix:
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... [".", "."], ... [".", "."], ... ["S0", "S1"], ... ["S0", "S2"] ... ] >>> ds = sg.pedigree_kinship(ds, return_relationship=True) >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0. , 0.25 , 0.375], [0. , 0.5 , 0.25 , 0.125], [0.25 , 0.25 , 0.5 , 0.375], [0.375, 0.125, 0.375, 0.625]]) >>> ds["stat_pedigree_relationship"].values array([[1. , 0. , 0.5 , 0.75], [0. , 1. , 0.5 , 0.25], [0.5 , 0.5 , 1. , 0.75], [0.75, 0.25, 0.75, 1.25]])
Inbred diploid pedigree with related founders:
>>> import sgkit as sg >>> from numpy import nan >>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... [".", "."], ... [".", "."], ... ["S0", "S1"], ... ["S0", "S2"] ... ] >>> # add "known" kinships among founders >>> ds["founder_kinship"] = ["samples_0", "samples_1"], [ ... [0.5, 0.1, nan, nan], ... [0.1, 0.6, nan, nan], ... [nan, nan, nan, nan], ... [nan, nan, nan, nan], ... ] >>> ds = sg.pedigree_kinship( ... ds, ... founder_kinship="founder_kinship", ... ) >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0.1 , 0.3 , 0.4 ], [0.1 , 0.6 , 0.35 , 0.225], [0.3 , 0.35 , 0.55 , 0.425], [0.4 , 0.225, 0.425, 0.65 ]])
Somatic doubling and unreduced gamete:
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... ['.', '.'], ... ['.', '.'], ... ['S0', 'S0'], # somatic doubling encoded as selfing ... ['S1', 'S2'], # diploid * tetraploid ... ] >>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [ ... [1, 1], ... [1, 1], ... [2, 2], # both 'gametes' are full genomic copies ... [2, 2], # unreduced gamete from diploid 'S1' ... ] >>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [ ... [0, 0], ... [0, 0], ... [0, 0], ... [0.1, 0], # increased probability of IBD in unreduced gamete ... ] >>> ds = sg.pedigree_kinship(ds, method="Hamilton-Kerr") >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0. , 0.5 , 0.25 ], [0. , 0.5 , 0. , 0.25 ], [0.5 , 0. , 0.5 , 0.25 ], [0.25 , 0.25 , 0.25 , 0.30416667]])
Somatic doubling and unreduced gamete using a third parent column to indicate clonal propagation:
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... ['.', '.', '.'], ... ['.', '.', '.'], ... ['.', '.', 'S0'], # somatic doubling encoded as clone ... ['S1', 'S2', '.'], # diploid * tetraploid ... ] >>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [ ... [1, 1, 0], ... [1, 1, 0], ... [0, 0, 4], # 4 homologues derived from diploid 'S0' ... [2, 2, 0], # unreduced gamete from diploid 'S1' ... ] >>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [ ... [0, 0, 0], ... [0, 0, 0], ... [0, 0, 1/3], # increased probability of IBD in somatic doubling ... [0.1, 0, 0], # increased probability of IBD in unreduced gamete ... ] >>> ds = sg.pedigree_kinship(ds, method="Hamilton-Kerr") >>> ds["stat_pedigree_kinship"].values array([[0.5 , 0. , 0.5 , 0.25 ], [0. , 0.5 , 0. , 0.25 ], [0.5 , 0. , 0.5 , 0.25 ], [0.25 , 0.25 , 0.25 , 0.30416667]])
References
[1] - Matthew G. Hamilton and Richard J. Kerr 2017. “Computation of the inverse additive relationship matrix for autopolyploid and multiple-ploidy populations.” Theoretical and Applied Genetics 131: 851-860.
[2] - Jérôme Goudet, Tomas Kay and Bruce S. Weir 2018. “How to estimate kinship.” Molecular Ecology 27: 4121-4135.