sgkit.pedigree_inbreeding#

sgkit.pedigree_inbreeding(ds, *, method='diploid', parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', stat_Hamilton_Kerr_lambda='stat_Hamilton_Kerr_lambda', allow_half_founders=False, merge=True)#

Estimate expected inbreeding coefficients from pedigree structure.

Parameters:
ds Dataset

Dataset containing pedigree structure.

method {‘diploid’, ‘Hamilton-Kerr’}Literal['diploid', 'Hamilton-Kerr'] (default: 'diploid')

The method used for inbreeding estimation. Defaults to “diploid” which is only suitable for pedigrees in which all samples are diploids resulting from sexual reproduction. The “Hamilton-Kerr” method is suitable for autopolyploid and mixed-ploidy datasets following Hamilton and Kerr 2017 [1].

parent Hashable (default: 'parent')

Input variable name holding parents of each sample as defined by sgkit.variables.parent_spec. If the variable is not present in ds, it will be computed using parent_indices().

stat_Hamilton_Kerr_tau Hashable (default: 'stat_Hamilton_Kerr_tau')

Input variable name holding stat_Hamilton_Kerr_tau as defined by sgkit.variables.stat_Hamilton_Kerr_tau_spec. This variable is only required for the “Hamilton-Kerr” method.

stat_Hamilton_Kerr_lambda Hashable (default: 'stat_Hamilton_Kerr_lambda')

Input variable name holding stat_Hamilton_Kerr_lambda as defined by sgkit.variables.stat_Hamilton_Kerr_lambda_spec. This variable is only required for the “Hamilton-Kerr” method.

allow_half_founders bool (default: False)

If False (the default) then a ValueError will be raised if any individuals only have a single recorded parent. If True then the unrecorded parent will be assumed to be a unique founder unrelated to all other founders. If the Hamilton-Kerr method is used with half-founders then the tau and lambda parameters for gametes contributing to the unrecorded parent will be assumed to be equal to those of the gamete originating from that parent.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.stat_pedigree_inbreeding_spec.

Raises:
  • ValueError – If an unknown method is specified.

  • ValueError – If the diploid method is used with a non-diploid dataset.

  • ValueError – If the diploid method is used and the parents dimension does not have a length of two.

  • ValueError – If the Hamilton-Kerr method is used and a sample has more than two contributing parents.

  • ValueError – If the pedigree contains half-founders and allow_half_founders=False.

Note

This implementation minimizes memory usage by calculating only a minimal subset of kinship coefficients which are required to calculate inbreeding coefficients. However, if the full kinship matrix has already been calculated, it is more efficient to calculate inbreeding coefficients directly from self-kinship values (i.e., the diagonal values of the kinship matrix).

The inbreeding coefficient of each individual can be calculated from its self-kinship using the formula \(\hat{F}_i=\frac{\hat{\phi}_{ii}k_i - 1}{k_i - 1}\) where \(\hat{\phi}_{ii}\) is a pedigree based estimate for the self kinship of individual \(i\) and \(k_i\) is that individuals ploidy.

Note

The Hamilton-Kerr method may be applied to a dataset with more than two parent columns so long as each sample has two or fewer contributing parents as indicated by the stat_Hamilton_Kerr_tau variable. Within this variable, a contributing parent is indicated by a value greater than zero. Each sample must also have at least one (possibly unknown) contributing parent. Therefore, each row of the stat_Hamilton_Kerr_tau variable must have either one or two non-zero values.

Examples

Inbred diploid pedigree:

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     [".", "."],
...     [".", "."],
...     ["S0", "S1"],
...     ["S0", "S2"]
... ]
>>> ds = sg.pedigree_inbreeding(ds)
>>> ds["stat_pedigree_inbreeding"].values 
array([0.  , 0.  , 0.  , 0.25])

Somatic doubling and unreduced gamete:

>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     ['.', '.'],
...     ['.', '.'],
...     ['S0', 'S0'],  # somatic doubling encoded as selfing
...     ['S1', 'S2'],  # diploid * tetraploid
... ]
>>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [
...     [1, 1],
...     [1, 1],
...     [2, 2],  # both 'gametes' are full genomic copies
...     [2, 2],  # unreduced gamete from diploid 'S1'
... ]
>>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [
...     [0, 0],
...     [0, 0],
...     [0, 0],
...     [0.1, 0],  # increased probability of IBD in unreduced gamete
... ]
>>> ds = sg.pedigree_inbreeding(ds, method="Hamilton-Kerr")
>>> ds["stat_pedigree_inbreeding"].values 
array([0.        , 0.        , 0.33333333, 0.07222222])

Somatic doubling and unreduced gamete using a third parent column to indicate clonal propagation:

>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2', 'S3'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     ['.', '.', '.'],
...     ['.', '.', '.'],
...     ['.', '.', 'S0'],  # somatic doubling encoded as clone
...     ['S1', 'S2', '.'],  # diploid * tetraploid
... ]
>>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [
...     [1, 1, 0],
...     [1, 1, 0],
...     [0, 0, 4],  # 4 homologues derived from diploid 'S0'
...     [2, 2, 0],  # unreduced gamete from diploid 'S1'
... ]
>>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [
...     [0, 0, 0],
...     [0, 0, 0],
...     [0, 0, 1/3],  # increased probability of IBD in somatic doubling
...     [0.1, 0, 0],  # increased probability of IBD in unreduced gamete
... ]
>>> ds = sg.pedigree_inbreeding(ds, method="Hamilton-Kerr")
>>> ds["stat_pedigree_inbreeding"].values 
array([0.        , 0.        , 0.33333333, 0.07222222])

References

[1] - Matthew G. Hamilton, and Richard J. Kerr 2017. “Computation of the inverse additive relationship matrix for autopolyploid and multiple-ploidy populations.” Theoretical and Applied Genetics 131: 851-860.