sgkit.pedigree_kinship#

sgkit.pedigree_kinship(ds, *, method='diploid', parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', stat_Hamilton_Kerr_lambda='stat_Hamilton_Kerr_lambda', return_relationship=False, allow_half_founders=False, founder_kinship=None, founder_indices=None, merge=True)#

Estimate expected pairwise kinship coefficients from pedigree structure.

This method can optionally return the additive relationship matrix (ARM or A-matrix).

Parameters:
ds Dataset

Dataset containing pedigree structure.

method {‘diploid’, ‘Hamilton-Kerr’}Literal['diploid', 'Hamilton-Kerr'] (default: 'diploid')

The method used for kinship estimation. Defaults to “diploid” which is only suitable for pedigrees in which all samples are diploids resulting from sexual reproduction. The “Hamilton-Kerr” method is suitable for autopolyploid and mixed-ploidy datasets following Hamilton and Kerr 2017 [1].

parent Hashable (default: 'parent')

Input variable name holding parents of each sample as defined by sgkit.variables.parent_spec. If the variable is not present in ds, it will be computed using parent_indices().

stat_Hamilton_Kerr_tau Hashable (default: 'stat_Hamilton_Kerr_tau')

Input variable name holding stat_Hamilton_Kerr_tau as defined by sgkit.variables.stat_Hamilton_Kerr_tau_spec. This variable is only required for the “Hamilton-Kerr” method.

stat_Hamilton_Kerr_lambda Hashable (default: 'stat_Hamilton_Kerr_lambda')

Input variable name holding stat_Hamilton_Kerr_lambda as defined by sgkit.variables.stat_Hamilton_Kerr_lambda_spec. This variable is only required for the “Hamilton-Kerr” method.

return_relationship bool (default: False)

If True, the additive relationship matrix will be returned in addition to the kinship matrix.

allow_half_founders bool (default: False)

If False (the default) then a ValueError will be raised if any individuals only have a single recorded parent. If True then the unrecorded parent will be assumed to be a unique founder unrelated to all other founders. If the Hamilton-Kerr method is used with half-founders then the tau and lambda parameters for gametes contributing to the unrecorded parent will be assumed to be equal to those of the gamete originating from that parent.

founder_kinship Hashable | NoneOptional[Hashable] (default: None)

Optionally specify an input kinship matrix as defined by sgkit.variables.stat_genomic_kinship_spec. Kinship estimates among founders within this matrix will be used to initialize the pedigree estimates as outlined by Goudet et al 2018 [2]. Kinship estimates for non-founders are ignored.

founder_indices Hashable | NoneOptional[Hashable] (default: None)

Optionally specify an array of integer indices mapping rows/columns in a founder_kinship sub-matrix to sample positions in the samples dimension (i.e., the order of rows in the parent array). This variable must have the same length as founder_kinship.

Deprecated since version 0.7.0: Instead, use a ‘founder_kinship’ matrix with values for all pairs of samples (these may be nan values).

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.stat_pedigree_kinship_spec and, if return_relationship is True, sgkit.variables.stat_pedigree_relationship_spec.

Raises:
  • ValueError – If an unknown method is specified.

  • ValueError – If the pedigree contains a directed loop.

  • ValueError – If the diploid method is used with a non-diploid dataset.

  • ValueError – If the diploid method is used and the parents dimension does not have a length of two.

  • ValueError – If the Hamilton-Kerr method is used and a sample has more than two contributing parents.

  • ValueError – If the pedigree contains half-founders and allow_half_founders=False.

  • ValueError – If the dimension sizes of founder_kinship are not equal to the number of samples in the pedigree (when founder_indices is not specified).

  • ValueError – If the founder_kinship and founder_indices variables are both specified and have inconsistent shapes.

Note

This method is faster when a pedigree is sorted in topological order such that parents occur before their children.

Note

The diagonal values of sgkit.variables.stat_pedigree_kinship_spec are self-kinship estimates as opposed to inbreeding estimates.

Note

Dimensions of sgkit.variables.stat_pedigree_kinship_spec and sgkit.variables.stat_pedigree_relationship_spec are named samples_0 and samples_1.

Note

If founder kinships are specified for a half-founder, then that individual will be treated as a full-founder by ignoring its known parent.

Note

The Hamilton-Kerr method may be applied to a dataset with more than two parent columns so long as each sample has two or fewer contributing parents as indicated by the stat_Hamilton_Kerr_tau variable. Within this variable, a contributing parent is indicated by a value greater than zero. Each sample must also have at least one (possibly unknown) contributing parent. Therefore, each row of the stat_Hamilton_Kerr_tau variable must have either one or two non-zero values.

Examples

Inbred diploid pedigree returning additive relationship matrix:

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     [".", "."],
...     [".", "."],
...     ["S0", "S1"],
...     ["S0", "S2"]
... ]
>>> ds = sg.pedigree_kinship(ds, return_relationship=True)
>>> ds["stat_pedigree_kinship"].values 
array([[0.5  , 0.   , 0.25 , 0.375],
       [0.   , 0.5  , 0.25 , 0.125],
       [0.25 , 0.25 , 0.5  , 0.375],
       [0.375, 0.125, 0.375, 0.625]])
>>> ds["stat_pedigree_relationship"].values 
array([[1.  , 0.  , 0.5 , 0.75],
       [0.  , 1.  , 0.5 , 0.25],
       [0.5 , 0.5 , 1.  , 0.75],
       [0.75, 0.25, 0.75, 1.25]])

Inbred diploid pedigree with related founders:

>>> import sgkit as sg
>>> from numpy import nan
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     [".", "."],
...     [".", "."],
...     ["S0", "S1"],
...     ["S0", "S2"]
... ]
>>> # add "known" kinships among founders
>>> ds["founder_kinship"] = ["samples_0", "samples_1"], [
...     [0.5, 0.1, nan, nan],
...     [0.1, 0.6, nan, nan],
...     [nan, nan, nan, nan],
...     [nan, nan, nan, nan],
... ]
>>> ds = sg.pedigree_kinship(
...     ds,
...     founder_kinship="founder_kinship",
... )
>>> ds["stat_pedigree_kinship"].values 
array([[0.5  , 0.1  , 0.3  , 0.4  ],
       [0.1  , 0.6  , 0.35 , 0.225],
       [0.3  , 0.35 , 0.55 , 0.425],
       [0.4  , 0.225, 0.425, 0.65 ]])

Somatic doubling and unreduced gamete:

>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     ['.', '.'],
...     ['.', '.'],
...     ['S0', 'S0'],  # somatic doubling encoded as selfing
...     ['S1', 'S2'],  # diploid * tetraploid
... ]
>>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [
...     [1, 1],
...     [1, 1],
...     [2, 2],  # both 'gametes' are full genomic copies
...     [2, 2],  # unreduced gamete from diploid 'S1'
... ]
>>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [
...     [0, 0],
...     [0, 0],
...     [0, 0],
...     [0.1, 0],  # increased probability of IBD in unreduced gamete
... ]
>>> ds = sg.pedigree_kinship(ds, method="Hamilton-Kerr")
>>> ds["stat_pedigree_kinship"].values 
array([[0.5       , 0.        , 0.5       , 0.25      ],
       [0.        , 0.5       , 0.        , 0.25      ],
       [0.5       , 0.        , 0.5       , 0.25      ],
       [0.25      , 0.25      , 0.25      , 0.30416667]])

Somatic doubling and unreduced gamete using a third parent column to indicate clonal propagation:

>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     ['.', '.', '.'],
...     ['.', '.', '.'],
...     ['.', '.', 'S0'],  # somatic doubling encoded as clone
...     ['S1', 'S2', '.'],  # diploid * tetraploid
... ]
>>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [
...     [1, 1, 0],
...     [1, 1, 0],
...     [0, 0, 4],  # 4 homologues derived from diploid 'S0'
...     [2, 2, 0],  # unreduced gamete from diploid 'S1'
... ]
>>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [
...     [0, 0, 0],
...     [0, 0, 0],
...     [0, 0, 1/3],  # increased probability of IBD in somatic doubling
...     [0.1, 0, 0],  # increased probability of IBD in unreduced gamete
... ]
>>> ds = sg.pedigree_kinship(ds, method="Hamilton-Kerr")
>>> ds["stat_pedigree_kinship"].values 
array([[0.5       , 0.        , 0.5       , 0.25      ],
       [0.        , 0.5       , 0.        , 0.25      ],
       [0.5       , 0.        , 0.5       , 0.25      ],
       [0.25      , 0.25      , 0.25      , 0.30416667]])

References

[1] - Matthew G. Hamilton and Richard J. Kerr 2017. “Computation of the inverse additive relationship matrix for autopolyploid and multiple-ploidy populations.” Theoretical and Applied Genetics 131: 851-860.

[2] - Jérôme Goudet, Tomas Kay and Bruce S. Weir 2018. “How to estimate kinship.” Molecular Ecology 27: 4121-4135.