sgkit.hybrid_relationship#
- sgkit.hybrid_relationship(ds, *, genomic_relationship, pedigree_relationship=None, pedigree_subset_inverse_relationship=None, genomic_sample=None, estimator=None, tau=1.0, omega=1.0, merge=True)#
Compute a hybrid relationship matrix (AKA the HRM or H-matrix) combining pedigree and genomic information.
- Parameters:
- ds
Dataset
Dataset containing the inverse genomic and pedigree relationship matrices.
- genomic_relationship
Hashable
Genomic relationship matrix as defined by
sgkit.variables.stat_genomic_relationship_spec
. This may include unknown relationships indicated by nan values.- pedigree_relationship
Hashable
|None
Optional
[Hashable
] (default:None
) Pedigree relationship matrix as defined by
sgkit.variables.stat_pedigree_relationship_spec
.- pedigree_subset_inverse_relationship
Hashable
|None
Optional
[Hashable
] (default:None
) Optionally specify a variable containing the inverse of the subset of the pedigree relationship matrix corresponding to the genomic samples as defined by
sgkit.variables.stat_inverse_relationship_spec
. If absent, this argument will be automatically computed from the pedigree relationship matrix. If all samples are included in the genomic relationship matrix, then this variable is equivalent tosgkit.variables.stat_pedigree_inverse_relationship_spec
and the pedigree_relationship variable is ignored.- genomic_sample
Hashable
|None
Optional
[Hashable
] (default:None
) Optionally specify a variable containing an array of booleans which indicate the subset of samples with genomic relationships. If absent, it is assumed that genomic relationships are present for all samples.
- estimator {‘Martini’} |
None
Optional
[Literal
['Martini'
]] (default:None
) Specifies the estimator used to combine matrices. Currently the only option is
"Martini"
following Martini et al 2018 [1] which expands on the estimators of Legarra et al 2009 [2] and Aguiler et al 2010 [3].- tau
float
(default:1.0
) Scaling factor for genomic relationships.
- omega
float
(default:1.0
) Scaling factor for pedigree relationships.
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.stat_hybrid_relationship_spec
which is a matrix of pairwise relationships among all samples. The dimensions are namedsamples_0
andsamples_1
.- Raises:
ValueError – If an unknown estimator is specified.
ValueError – If the matrices do not have the same dimensions.
Note
This method is more efficient when samples with genomic relationships are adjacent to one another and are the first or last samples in the dataset.
Warning
The
pedigree_subset_inverse_relationship
parameter must be calculated as the inverse of a subset of the pedigree relationship matrix, rather than a subset of the inverse of the pedigree relationship matrix. See methodinvert_relationship_matrix()
.See also
Examples
>>> import xarray as xr >>> import sgkit as sg >>> from numpy import nan >>> ds = xr.Dataset() >>> # A-matrix >>> ds["stat_pedigree_relationship"] = ["samples_0", "samples_1"], [ ... [1. , 0. , 0. , 0.5 , 0. , 0.25 ], ... [0. , 1. , 0. , 0.5 , 0.5 , 0.5 ], ... [0. , 0. , 1. , 0. , 0.5 , 0.25 ], ... [0.5 , 0.5 , 0. , 1. , 0.25 , 0.625], ... [0. , 0.5 , 0.5 , 0.25 , 1. , 0.625], ... [0.25 , 0.5 , 0.25 , 0.625, 0.625, 1.125] ... ] >>> # G-matrix >>> ds["stat_genomic_relationship"] = ["samples_0", "samples_1"], [ ... [nan, nan, nan, nan, nan, nan], ... [nan, nan, nan, nan, nan, nan], ... [nan, nan, nan, nan, nan, nan], ... [nan, nan, nan, 1.3 , 0.3 , 0.71], ... [nan, nan, nan, 0.3 , 1.3 , 0.73], ... [nan, nan, nan, 0.71, 0.73, 1.45], ... ] >>> # samples included in G-matrix >>> ds["genomic_sample"] = "samples", [False, False, False, True, True, True] >>> ds = sg.hybrid_relationship( ... ds, ... pedigree_relationship="stat_pedigree_relationship", ... genomic_relationship="stat_genomic_relationship", ... genomic_sample="genomic_sample", ... ) >>> ds.stat_hybrid_relationship.values.round(3) array([[ 1.084, 0.056, -0.028, 0.653, -0.013, 0.281], [ 0.056, 1.112, 0.056, 0.64 , 0.64 , 0.576], [-0.028, 0.056, 1.084, -0.013, 0.653, 0.295], [ 0.653, 0.64 , -0.013, 1.3 , 0.3 , 0.71 ], [-0.013, 0.64 , 0.653, 0.3 , 1.3 , 0.73 ], [ 0.281, 0.576, 0.295, 0.71 , 0.73 , 1.45 ]])
References
[1] - J. W. R. Martini, M. F. Schrauf, C. A. Garcia-Baccino, E. C. G. Pimentel, S. Munilla, A. Roberg-Muñoz, R. J. C. Cantet, C. Reimer, N. Gao, V. Wimmer and H. Simianer 2018. “The effect of the \(H^{-1}\) scaling factors \(\tau\) and \(\omega\) on the structure of \(H\) in the single-step procedure.” Genetics Selection Evolution 50 (16).
[2] - A. Legarra, I. Aguilar and I. Misztal 2009. “A relationship matrix including full pedigree and genomic information.” Journal of Dairy Science 92 (9): 4656-4663.
[3] - I. Aguilar, I. Misztal, D. L. Johnson, A. Legarra, S. Tsuruta and T. J. Lawlor 2010. “A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score.” Journal of Dairy Science 93 (2): 743-752.