sgkit.pedigree_inbreeding#
- sgkit.pedigree_inbreeding(ds, *, method='diploid', parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', stat_Hamilton_Kerr_lambda='stat_Hamilton_Kerr_lambda', allow_half_founders=False, merge=True)#
Estimate expected inbreeding coefficients from pedigree structure.
- Parameters:
- ds
Dataset
Dataset containing pedigree structure.
- method {‘diploid’, ‘Hamilton-Kerr’}
Literal
['diploid'
,'Hamilton-Kerr'
] (default:'diploid'
) The method used for inbreeding estimation. Defaults to “diploid” which is only suitable for pedigrees in which all samples are diploids resulting from sexual reproduction. The “Hamilton-Kerr” method is suitable for autopolyploid and mixed-ploidy datasets following Hamilton and Kerr 2017 [1].
- parent
Hashable
(default:'parent'
) Input variable name holding parents of each sample as defined by
sgkit.variables.parent_spec
. If the variable is not present inds
, it will be computed usingparent_indices()
.- stat_Hamilton_Kerr_tau
Hashable
(default:'stat_Hamilton_Kerr_tau'
) Input variable name holding stat_Hamilton_Kerr_tau as defined by
sgkit.variables.stat_Hamilton_Kerr_tau_spec
. This variable is only required for the “Hamilton-Kerr” method.- stat_Hamilton_Kerr_lambda
Hashable
(default:'stat_Hamilton_Kerr_lambda'
) Input variable name holding stat_Hamilton_Kerr_lambda as defined by
sgkit.variables.stat_Hamilton_Kerr_lambda_spec
. This variable is only required for the “Hamilton-Kerr” method.- allow_half_founders
bool
(default:False
) If False (the default) then a ValueError will be raised if any individuals only have a single recorded parent. If True then the unrecorded parent will be assumed to be a unique founder unrelated to all other founders. If the Hamilton-Kerr method is used with half-founders then the tau and lambda parameters for gametes contributing to the unrecorded parent will be assumed to be equal to those of the gamete originating from that parent.
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.stat_pedigree_inbreeding_spec
.- Raises:
ValueError – If an unknown method is specified.
ValueError – If the diploid method is used with a non-diploid dataset.
ValueError – If the diploid method is used and the parents dimension does not have a length of two.
ValueError – If the Hamilton-Kerr method is used and a sample has more than two contributing parents.
ValueError – If the pedigree contains half-founders and allow_half_founders=False.
Note
This implementation minimizes memory usage by calculating only a minimal subset of kinship coefficients which are required to calculate inbreeding coefficients. However, if the full kinship matrix has already been calculated, it is more efficient to calculate inbreeding coefficients directly from self-kinship values (i.e., the diagonal values of the kinship matrix).
The inbreeding coefficient of each individual can be calculated from its self-kinship using the formula \(\hat{F}_i=\frac{\hat{\phi}_{ii}k_i - 1}{k_i - 1}\) where \(\hat{\phi}_{ii}\) is a pedigree based estimate for the self kinship of individual \(i\) and \(k_i\) is that individuals ploidy.
Note
The Hamilton-Kerr method may be applied to a dataset with more than two parent columns so long as each sample has two or fewer contributing parents as indicated by the
stat_Hamilton_Kerr_tau
variable. Within this variable, a contributing parent is indicated by a value greater than zero. Each sample must also have at least one (possibly unknown) contributing parent. Therefore, each row of thestat_Hamilton_Kerr_tau
variable must have either one or two non-zero values.Examples
Inbred diploid pedigree:
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... [".", "."], ... [".", "."], ... ["S0", "S1"], ... ["S0", "S2"] ... ] >>> ds = sg.pedigree_inbreeding(ds) >>> ds["stat_pedigree_inbreeding"].values array([0. , 0. , 0. , 0.25])
Somatic doubling and unreduced gamete:
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... ['.', '.'], ... ['.', '.'], ... ['S0', 'S0'], # somatic doubling encoded as selfing ... ['S1', 'S2'], # diploid * tetraploid ... ] >>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [ ... [1, 1], ... [1, 1], ... [2, 2], # both 'gametes' are full genomic copies ... [2, 2], # unreduced gamete from diploid 'S1' ... ] >>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [ ... [0, 0], ... [0, 0], ... [0, 0], ... [0.1, 0], # increased probability of IBD in unreduced gamete ... ] >>> ds = sg.pedigree_inbreeding(ds, method="Hamilton-Kerr") >>> ds["stat_pedigree_inbreeding"].values array([0. , 0. , 0.33333333, 0.07222222])
Somatic doubling and unreduced gamete using a third parent column to indicate clonal propagation:
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=4, n_ploidy=4, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2', 'S3'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... ['.', '.', '.'], ... ['.', '.', '.'], ... ['.', '.', 'S0'], # somatic doubling encoded as clone ... ['S1', 'S2', '.'], # diploid * tetraploid ... ] >>> ds["stat_Hamilton_Kerr_tau"] = ["samples", "parents"], [ ... [1, 1, 0], ... [1, 1, 0], ... [0, 0, 4], # 4 homologues derived from diploid 'S0' ... [2, 2, 0], # unreduced gamete from diploid 'S1' ... ] >>> ds["stat_Hamilton_Kerr_lambda"] = ["samples", "parents"], [ ... [0, 0, 0], ... [0, 0, 0], ... [0, 0, 1/3], # increased probability of IBD in somatic doubling ... [0.1, 0, 0], # increased probability of IBD in unreduced gamete ... ] >>> ds = sg.pedigree_inbreeding(ds, method="Hamilton-Kerr") >>> ds["stat_pedigree_inbreeding"].values array([0. , 0. , 0.33333333, 0.07222222])
References
[1] - Matthew G. Hamilton, and Richard J. Kerr 2017. “Computation of the inverse additive relationship matrix for autopolyploid and multiple-ploidy populations.” Theoretical and Applied Genetics 131: 851-860.