sgkit.hybrid_relationship#

sgkit.hybrid_relationship(ds, *, genomic_relationship, pedigree_relationship=None, pedigree_subset_inverse_relationship=None, genomic_sample=None, estimator=None, tau=1.0, omega=1.0, merge=True)#

Compute a hybrid relationship matrix (AKA the HRM or H-matrix) combining pedigree and genomic information.

Parameters:
ds Dataset

Dataset containing the inverse genomic and pedigree relationship matrices.

genomic_relationship Hashable

Genomic relationship matrix as defined by sgkit.variables.stat_genomic_relationship_spec. This may include unknown relationships indicated by nan values.

pedigree_relationship Hashable | NoneOptional[Hashable] (default: None)

Pedigree relationship matrix as defined by sgkit.variables.stat_pedigree_relationship_spec.

pedigree_subset_inverse_relationship Hashable | NoneOptional[Hashable] (default: None)

Optionally specify a variable containing the inverse of the subset of the pedigree relationship matrix corresponding to the genomic samples as defined by sgkit.variables.stat_inverse_relationship_spec. If absent, this argument will be automatically computed from the pedigree relationship matrix. If all samples are included in the genomic relationship matrix, then this variable is equivalent to sgkit.variables.stat_pedigree_inverse_relationship_spec and the pedigree_relationship variable is ignored.

genomic_sample Hashable | NoneOptional[Hashable] (default: None)

Optionally specify a variable containing an array of booleans which indicate the subset of samples with genomic relationships. If absent, it is assumed that genomic relationships are present for all samples.

estimator {‘Martini’} | NoneOptional[Literal['Martini']] (default: None)

Specifies the estimator used to combine matrices. Currently the only option is "Martini" following Martini et al 2018 [1] which expands on the estimators of Legarra et al 2009 [2] and Aguiler et al 2010 [3].

tau float (default: 1.0)

Scaling factor for genomic relationships.

omega float (default: 1.0)

Scaling factor for pedigree relationships.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.stat_hybrid_relationship_spec which is a matrix of pairwise relationships among all samples. The dimensions are named samples_0 and samples_1.

Raises:
  • ValueError – If an unknown estimator is specified.

  • ValueError – If the matrices do not have the same dimensions.

Note

This method is more efficient when samples with genomic relationships are adjacent to one another and are the first or last samples in the dataset.

Warning

The pedigree_subset_inverse_relationship parameter must be calculated as the inverse of a subset of the pedigree relationship matrix, rather than a subset of the inverse of the pedigree relationship matrix. See method invert_relationship_matrix().

Examples

>>> import xarray as xr
>>> import sgkit as sg
>>> from numpy import nan
>>> ds = xr.Dataset()
>>> # A-matrix
>>> ds["stat_pedigree_relationship"] = ["samples_0", "samples_1"], [
...     [1.   , 0.   , 0.   , 0.5  , 0.   , 0.25 ],
...     [0.   , 1.   , 0.   , 0.5  , 0.5  , 0.5  ],
...     [0.   , 0.   , 1.   , 0.   , 0.5  , 0.25 ],
...     [0.5  , 0.5  , 0.   , 1.   , 0.25 , 0.625],
...     [0.   , 0.5  , 0.5  , 0.25 , 1.   , 0.625],
...     [0.25 , 0.5  , 0.25 , 0.625, 0.625, 1.125]
... ]
>>> # G-matrix
>>> ds["stat_genomic_relationship"] = ["samples_0", "samples_1"], [
...     [nan,  nan,  nan,  nan,  nan,  nan],
...     [nan,  nan,  nan,  nan,  nan,  nan],
...     [nan,  nan,  nan,  nan,  nan,  nan],
...     [nan,  nan,  nan, 1.3 , 0.3 , 0.71],
...     [nan,  nan,  nan, 0.3 , 1.3 , 0.73],
...     [nan,  nan,  nan, 0.71, 0.73, 1.45],
... ]
>>> # samples included in G-matrix
>>> ds["genomic_sample"] = "samples", [False, False, False, True, True, True]
>>> ds = sg.hybrid_relationship(
...     ds,
...     pedigree_relationship="stat_pedigree_relationship",
...     genomic_relationship="stat_genomic_relationship",
...     genomic_sample="genomic_sample",
... )
>>> ds.stat_hybrid_relationship.values.round(3)  
array([[ 1.084,  0.056, -0.028,  0.653, -0.013,  0.281],
       [ 0.056,  1.112,  0.056,  0.64 ,  0.64 ,  0.576],
       [-0.028,  0.056,  1.084, -0.013,  0.653,  0.295],
       [ 0.653,  0.64 , -0.013,  1.3  ,  0.3  ,  0.71 ],
       [-0.013,  0.64 ,  0.653,  0.3  ,  1.3  ,  0.73 ],
       [ 0.281,  0.576,  0.295,  0.71 ,  0.73 ,  1.45 ]])

References

[1] - J. W. R. Martini, M. F. Schrauf, C. A. Garcia-Baccino, E. C. G. Pimentel, S. Munilla, A. Roberg-Muñoz, R. J. C. Cantet, C. Reimer, N. Gao, V. Wimmer and H. Simianer 2018. “The effect of the \(H^{-1}\) scaling factors \(\tau\) and \(\omega\) on the structure of \(H\) in the single-step procedure.” Genetics Selection Evolution 50 (16).

[2] - A. Legarra, I. Aguilar and I. Misztal 2009. “A relationship matrix including full pedigree and genomic information.” Journal of Dairy Science 92 (9): 4656-4663.

[3] - I. Aguilar, I. Misztal, D. L. Johnson, A. Legarra, S. Tsuruta and T. J. Lawlor 2010. “A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.” Journal of Dairy Science 93 (2): 743-752.