sgkit.pedigree_sel#
- sgkit.pedigree_sel(ds, *, samples, ancestor_depth=0, descendant_depth=0, parent='parent', sample_id='sample_id', parent_id='parent_id', sel_samples_0=True, sel_samples_1=True, update_parent_id=True, drop_parent=True)#
Return a new dataset with each array indexed along the ‘samples’ dimension using a subset of samples and the optional inclusion of their relatives.
- Parameters:
- ds
Dataset
Dataset containing pedigree structure.
- samples
ndarray
|Array
Union
[ndarray
,Array
] Coordinates of the ‘samples’ dimension or a boolean index with length equal to the ‘samples’ dimension.
- ancestor_depth
int
(default:0
) Optionally include ancestors of the specified ‘samples’ up to a a maximum depth. Use
-1
to include ancestors at any depth.- descendant_depth
int
(default:0
) Optionally include descendants of the specified ‘samples’ up to a a maximum depth. Use
-1
to include descendants at any depth.- parent
Hashable
(default:'parent'
) Input variable name holding parents of each sample as defined by
sgkit.variables.parent_spec
. If the variable is not present inds
, it will be computed usingparent_indices()
.- sample_id
Hashable
(default:'sample_id'
) Input variable name holding sample_id as defined by
sgkit.variables.sample_id_spec
.- parent_id
Hashable
(default:'parent_id'
) Input variable name holding parent_id as defined by
sgkit.variables.parent_id_spec
.- sel_samples_0
bool
(default:True
) If True (the default) and the dataset contains a ‘samples_0’ dimension, the selection will also be applied to this dimension.
- sel_samples_1
bool
(default:True
) If True (the default) and the dataset contains a ‘samples_1’ dimension, the selection will also be applied to this dimension.
- update_parent_id
bool
(default:True
) If True (the default), replace values of the ‘parent_id’ array with the missing value (
'.'
) where the corresponding sample is not included in the new dataset.- drop_parent
bool
(default:True
) If True (the default), the ‘parent’ variable will be dropped from the new dataset (this variable is invalidated by selecting samples).
- ds
- Return type:
- Returns:
: A dataset containing a subset of samples.
Examples
Create a pedigree dataset with three generations
>>> ds = xr.Dataset() >>> ds["sample_id"] = "samples", ["S0", "S1", "S2", "S3", "S4"] >>> ds["parent_id"] = ["samples", "parents"], [ ... [".", "."], ... [".", "."], ... ["S0", "S1"], ... ["S0", "S1"], ... ["S2", "."] ... ]
Select the first sample using its integer coordinate and include its children
>>> ds1 = pedigree_sel(ds, samples=0, descendant_depth=1) >>> ds1.sample_id.values array(['S0', 'S2', 'S3'], dtype='<U2') >>> ds1.parent_id.values array([['.', '.'], ['S0', '.'], ['S0', '.']], dtype='<U2')
Select the third sample using a boolean index and include its parents
>>> ds2 = pedigree_sel(ds, samples=[False, False, True, False, False], ancestor_depth=1) >>> ds2.sample_id.values array(['S0', 'S1', 'S2'], dtype='<U2') >>> ds2.parent_id.values array([['.', '.'], ['.', '.'], ['S0', 'S1']], dtype='<U2')
Select the second sample using its ‘sample_id’ and include all of its descendants
>>> ds = ds.assign_coords(dict(samples=ds.sample_id.values)) >>> ds3 = pedigree_sel(ds, samples="S1", descendant_depth=-1) >>> ds3.sample_id.values array(['S1', 'S2', 'S3', 'S4'], dtype='<U2') >>> ds3.parent_id.values array([['.', '.'], ['.', 'S1'], ['.', 'S1'], ['S2', '.']], dtype='<U2')