sgkit.pedigree_kinship#

sgkit.pedigree_kinship(ds, *, method='diploid', parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', stat_Hamilton_Kerr_lambda='stat_Hamilton_Kerr_lambda', chunks=None, return_relationship=False, allow_half_founders=False, founder_kinship=None, founder_indices=None, merge=True)#

Estimate expected pairwise kinship coefficients from pedigree structure.

This method can optionally return the additive relationship matrix (ARM or A-matrix).

Parameters:
ds Dataset

Dataset containing pedigree structure.

method {‘diploid’, ‘Hamilton-Kerr’}Literal['diploid', 'Hamilton-Kerr'] (default: 'diploid')

The method used for kinship estimation. Defaults to “diploid” which is only suitable for pedigrees in which all samples are diploids resulting from sexual reproduction. The “Hamilton-Kerr” method is suitable for autopolyploid and mixed-ploidy datasets following Hamilton and Kerr 2017 [1].

parent Hashable (default: 'parent')

Input variable name holding parents of each sample as defined by sgkit.variables.parent_spec. If the variable is not present in ds, it will be computed using parent_indices().

stat_Hamilton_Kerr_tau Hashable (default: 'stat_Hamilton_Kerr_tau')

Input variable name holding stat_Hamilton_Kerr_tau as defined by sgkit.variables.stat_Hamilton_Kerr_tau_spec. This variable is only required for the “Hamilton-Kerr” method.

chunks Hashable | NoneOptional[Hashable] (default: None)

Optionally specify chunks for the returned matrices.

stat_Hamilton_Kerr_lambda Hashable (default: 'stat_Hamilton_Kerr_lambda')

Input variable name holding stat_Hamilton_Kerr_lambda as defined by sgkit.variables.stat_Hamilton_Kerr_lambda_spec. This variable is only required for the “Hamilton-Kerr” method.

return_relationship bool (default: False)

If True, the additive relationship matrix will be returned in addition to the kinship matrix.

allow_half_founders bool (default: False)

If False (the default) then a ValueError will be raised if any individuals only have a single recorded parent. If True then the unrecorded parent will be assumed to be a unique founder unrelated to all other founders. If the Hamilton-Kerr method is used with half-founders then the tau and lambda parameters for gametes contributing to the unrecorded parent will be assumed to be equal to those of the gamete originating from that parent.

founder_kinship Hashable | NoneOptional[Hashable] (default: None)

Optionally specify an input kinship matrix as defined by sgkit.variables.stat_genomic_kinship_spec. Kinship estimates among founders within this matrix will be used to initialize the pedigree estimates as outlined by Goudet et al 2018 [2]. Kinship estimates for non-founders are ignored.

founder_indices Hashable | NoneOptional[Hashable] (default: None)

Optionally specify an array of integer indices mapping rows/columns in a founder_kinship sub-matrix to sample positions in the samples dimension (i.e., the order of rows in the parent array). This variable must have the same length as founder_kinship.

Deprecated since version 0.7.0: Instead, use a ‘founder_kinship’ matrix with values for all pairs of samples (these may be nan values).

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.stat_pedigree_kinship_spec and, if return_relationship is True, sgkit.variables.stat_pedigree_relationship_spec.

Raises:
  • ValueError – If an unknown method is specified.

  • ValueError – If the pedigree contains a directed loop.

  • ValueError – If the diploid method is used with a non-diploid dataset.

  • ValueError – If the diploid method is used and the parents dimension does not have a length of two.

  • ValueError – If the Hamilton-Kerr method is used and a sample has more than two contributing parents.

  • ValueError – If the pedigree contains half-founders and allow_half_founders=False.

  • ValueError – If the dimension sizes of founder_kinship are not equal to the number of samples in the pedigree (when founder_indices is not specified).

  • ValueError – If the founder_kinship and founder_indices variables are both specified and have inconsistent shapes.

Note

This method is faster when a pedigree is sorted in topological order such that parents occur before their children.

Note

The diagonal values of sgkit.variables.stat_pedigree_kinship_spec are self-kinship estimates as opposed to inbreeding estimates.

Note

Dimensions of sgkit.variables.stat_pedigree_kinship_spec and sgkit.variables.stat_pedigree_relationship_spec are named samples_0 and samples_1.

Note

Chunked kinship computation is implemented by identifying the sub-pedigree corresponding to each output chunk. An intermediate kinship matrix is then calculated which includes the chunk samples and their ancestors. This can be inefficient in deep pedigrees with many generations.

Note

If founder kinships are specified for a half-founder, then that individual will be treated as a full-founder by ignoring its known parent.

Note

The Hamilton-Kerr method may be applied to a dataset with more than two parent columns so long as each sample has two or fewer contributing parents as indicated by the stat_Hamilton_Kerr_tau variable. Within this variable, a contributing parent is indicated by a value greater than zero. Each sample must also have at least one (possibly unknown) contributing parent. Therefore, each row of the stat_Hamilton_Kerr_tau variable must have either one or two non-zero values.

Examples

Inbred diploid pedigree returning additive relationship matrix:

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     [".", "."],
...     [".", "."],
...     ["S0", "S1"],
...     ["S0", "S2"]
... ]
>>> ds = sg.pedigree_kinship(ds, return_relationship=True)
>>> ds["stat_pedigree_kinship"].values 
array([[0.5  , 0.   , 0.25 , 0.375],
       [0.   , 0.5  , 0.25 , 0.125],
       [0.25 , 0.25 , 0.5  , 0.375],
       [0.375, 0.125, 0.375, 0.625]])
>>> ds["stat_pedigree_relationship"].values 
array([[1.  , 0.  , 0.5 , 0.75],
       [0.  , 1.  , 0.5 , 0.25],
       [0.5 , 0.5 , 1.  , 0.75],
       [0.75, 0.25, 0.75, 1.25]])

Inbred diploid pedigree with related founders:

>>> import sgkit as sg
>>> from numpy import nan
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     [".", "."],
...     [".", "."],
...     ["S0", "S1"],
...     ["S0", "S2"]
... ]
>>> # add "known" kinships among founders
>>> ds["founder_kinship"] = ["samples_0", "samples_1"], [
...     [0.5, 0.1, nan, nan],
...     [0.1, 0.6, nan, nan],
...     [nan, nan, nan, nan],
...     [nan, nan, nan, nan],
... ]
>>> ds = sg.pedigree_kinship(
...     ds,
...     founder_kinship="founder_kinship",
... )
>>> ds["stat_pedigree_kinship"].values 
array([[0.5  , 0.1  , 0.3  , 0.4  ],
       [0.1  , 0.6  , 0.35 , 0.225],
       [0.3  , 0.35 , 0.55 , 0.425],
       [0.4  , 0.225, 0.425, 0.65 ]])

Somatic doubling and unreduced gamete:

>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     ['.', '.'],
...     ['.', '.'],
...     ['S0', 'S0'],  # somatic doubling encoded as selfing
...     ['S1', 'S2'],  # diploid * tetraploid
... ]
>>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [
...     [1, 1],
...     [1, 1],
...     [2, 2],  # both 'gametes' are full genomic copies
...     [2, 2],  # unreduced gamete from diploid 'S1'
... ]
>>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [
...     [0, 0],
...     [0, 0],
...     [0, 0],
...     [0.1, 0],  # increased probability of IBD in unreduced gamete
... ]
>>> ds = sg.pedigree_kinship(ds, method="Hamilton-Kerr")
>>> ds["stat_pedigree_kinship"].values 
array([[0.5       , 0.        , 0.5       , 0.25      ],
       [0.        , 0.5       , 0.        , 0.25      ],
       [0.5       , 0.        , 0.5       , 0.25      ],
       [0.25      , 0.25      , 0.25      , 0.30416667]])

Somatic doubling and unreduced gamete using a third parent column to indicate clonal propagation:

>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     ['.', '.', '.'],
...     ['.', '.', '.'],
...     ['.', '.', 'S0'],  # somatic doubling encoded as clone
...     ['S1', 'S2', '.'],  # diploid * tetraploid
... ]
>>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [
...     [1, 1, 0],
...     [1, 1, 0],
...     [0, 0, 4],  # 4 homologues derived from diploid 'S0'
...     [2, 2, 0],  # unreduced gamete from diploid 'S1'
... ]
>>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [
...     [0, 0, 0],
...     [0, 0, 0],
...     [0, 0, 1/3],  # increased probability of IBD in somatic doubling
...     [0.1, 0, 0],  # increased probability of IBD in unreduced gamete
... ]
>>> ds = sg.pedigree_kinship(ds, method="Hamilton-Kerr")
>>> ds["stat_pedigree_kinship"].values 
array([[0.5       , 0.        , 0.5       , 0.25      ],
       [0.        , 0.5       , 0.        , 0.25      ],
       [0.5       , 0.        , 0.5       , 0.25      ],
       [0.25      , 0.25      , 0.25      , 0.30416667]])

References

[1] - Matthew G. Hamilton and Richard J. Kerr 2017. “Computation of the inverse additive relationship matrix for autopolyploid and multiple-ploidy populations.” Theoretical and Applied Genetics 131: 851-860.

[2] - Jérôme Goudet, Tomas Kay and Bruce S. Weir 2018. “How to estimate kinship.” Molecular Ecology 27: 4121-4135.