sgkit.parent_indices#

sgkit.parent_indices(ds, *, sample_id='sample_id', parent_id='parent_id', missing='.', merge=True)#

Calculate the integer indices for the parents of each sample within the samples dimension.

Parameters:
ds Dataset

Dataset containing pedigree structure.

sample_id Hashable (default: 'sample_id')

Input variable name holding sample_id as defined by sgkit.variables.sample_id_spec.

parent_id Hashable (default: 'parent_id')

Input variable name holding parent_id as defined by sgkit.variables.parent_id_spec.

missing Hashable (default: '.')

A value indicating unknown parents within the sgkit.variables.parent_id_spec array.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.parent_spec.

Raises:
  • ValueError – If the ‘missing’ value is a known sample identifier.

  • KeyError – If a parent identifier is not a known sample identifier.

Warning

The resulting indices within sgkit.variables.parent_spec may be invalidated by any alterations to sample ordering including sorting and the addition or removal of samples.

Examples

>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=3, seed=1)
>>> ds.sample_id.values 
array(['S0', 'S1', 'S2'], dtype='<U2')
>>> ds["parent_id"] = ["samples", "parents"], [
...     [".", "."],
...     [".", "."],
...     ["S0", "S1"]
... ]
>>> sg.parent_indices(ds)["parent"].values 
array([[-1, -1],
       [-1, -1],
       [ 0,  1]])