sgkit.pedigree_contribution#

sgkit.pedigree_contribution(ds, *, method='even', chunks=-1, parent='parent', stat_Hamilton_Kerr_tau='stat_Hamilton_Kerr_tau', merge=True)#

Calculate the expected genomic contribution of each sample to each other sample based on pedigree structure.

Parameters:
ds Dataset

Dataset containing pedigree structure.

method Literal['even', 'variable'] (default: 'even')

The method used for estimating genomic contributions. The ‘even’ method assumes that all samples are of a single, even ploidy (e.g., diploid) and have even contributions from each parent. The ‘variable’ method allows for un-even contributions due to ploidy manipulation and/or clonal reproduction.

chunks Hashable (default: -1)

Optionally specify chunks for the returned array. A single chunk is used by default. Currently, chunking is only supported for a single axis.

parent Hashable (default: 'parent')

Input variable name holding parents of each sample as defined by sgkit.variables.parent_spec. If the variable is not present in ds, it will be computed using parent_indices().

stat_Hamilton_Kerr_tau Hashable (default: 'stat_Hamilton_Kerr_tau')

Input variable name holding stat_Hamilton_Kerr_tau as defined by sgkit.variables.stat_Hamilton_Kerr_tau_spec. This variable is only required for the ‘variable’ method.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.stat_pedigree_contribution_spec.

Raises:
  • ValueError – If an unknown method is specified.

  • ValueError – If the ‘even’ method is specified for an odd-ploidy dataset.

  • ValueError – If the ‘even’ method is specified and the length of the ‘parents’ dimension is not 2.

  • NotImplementedError – If chunking is specified for both axes.

Note

Dimensions of sgkit.variables.stat_pedigree_contribution_spec are named samples_0 and samples_1.

Examples

>>> ds = xr.Dataset()
>>> ds["sample_id"] = "samples", ["S0", "S1", "S2", "S3", "S4", "S5"]
>>> ds["parent_id"] = ["samples", "parents"], [
...     [ ".",  "."],
...     [ ".",  "."],
...     ["S1", "S0"],
...     ["S2", "S0"],
...     ["S0", "S2"],
...     ["S1", "S3"]
... ]
>>> ds = pedigree_contribution(ds)
>>> ds.stat_pedigree_contribution.values  
array([[1.   , 0.   , 0.5  , 0.75 , 0.75 , 0.375],
       [0.   , 1.   , 0.5  , 0.25 , 0.25 , 0.625],
       [0.   , 0.   , 1.   , 0.5  , 0.5  , 0.25 ],
       [0.   , 0.   , 0.   , 1.   , 0.   , 0.5  ],
       [0.   , 0.   , 0.   , 0.   , 1.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 0.   , 1.   ]])