sgkit.parent_indices#
- sgkit.parent_indices(ds, *, sample_id='sample_id', parent_id='parent_id', missing='.', merge=True)#
Calculate the integer indices for the parents of each sample within the samples dimension.
- Parameters:
- ds
Dataset
Dataset containing pedigree structure.
- sample_id
Hashable
(default:'sample_id'
) Input variable name holding sample_id as defined by
sgkit.variables.sample_id_spec
.- parent_id
Hashable
(default:'parent_id'
) Input variable name holding parent_id as defined by
sgkit.variables.parent_id_spec
.- missing
Hashable
(default:'.'
) A value indicating unknown parents within the
sgkit.variables.parent_id_spec
array.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.parent_spec
.- Raises:
ValueError – If the ‘missing’ value is a known sample identifier.
KeyError – If a parent identifier is not a known sample identifier.
Warning
The resulting indices within
sgkit.variables.parent_spec
may be invalidated by any alterations to sample ordering including sorting and the addition or removal of samples.Examples
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=1, n_sample=3, seed=1) >>> ds.sample_id.values array(['S0', 'S1', 'S2'], dtype='<U2') >>> ds["parent_id"] = ["samples", "parents"], [ ... [".", "."], ... [".", "."], ... ["S0", "S1"] ... ] >>> sg.parent_indices(ds)["parent"].values array([[-1, -1], [-1, -1], [ 0, 1]])