sgkit.pedigree_sel#

sgkit.pedigree_sel(ds, *, samples, ancestor_depth=0, descendant_depth=0, parent='parent', sample_id='sample_id', parent_id='parent_id', sel_samples_0=True, sel_samples_1=True, update_parent_id=True, drop_parent=True)#

Return a new dataset with each array indexed along the ‘samples’ dimension using a subset of samples and the optional inclusion of their relatives.

Parameters:
ds Dataset

Dataset containing pedigree structure.

samples ndarray | ArrayUnion[ndarray, Array]

Coordinates of the ‘samples’ dimension or a boolean index with length equal to the ‘samples’ dimension.

ancestor_depth int (default: 0)

Optionally include ancestors of the specified ‘samples’ up to a a maximum depth. Use -1 to include ancestors at any depth.

descendant_depth int (default: 0)

Optionally include descendants of the specified ‘samples’ up to a a maximum depth. Use -1 to include descendants at any depth.

parent Hashable (default: 'parent')

Input variable name holding parents of each sample as defined by sgkit.variables.parent_spec. If the variable is not present in ds, it will be computed using parent_indices().

sample_id Hashable (default: 'sample_id')

Input variable name holding sample_id as defined by sgkit.variables.sample_id_spec.

parent_id Hashable (default: 'parent_id')

Input variable name holding parent_id as defined by sgkit.variables.parent_id_spec.

sel_samples_0 bool (default: True)

If True (the default) and the dataset contains a ‘samples_0’ dimension, the selection will also be applied to this dimension.

sel_samples_1 bool (default: True)

If True (the default) and the dataset contains a ‘samples_1’ dimension, the selection will also be applied to this dimension.

update_parent_id bool (default: True)

If True (the default), replace values of the ‘parent_id’ array with the missing value ('.') where the corresponding sample is not included in the new dataset.

drop_parent bool (default: True)

If True (the default), the ‘parent’ variable will be dropped from the new dataset (this variable is invalidated by selecting samples).

Return type:

Dataset

Returns:

: A dataset containing a subset of samples.

Examples

Create a pedigree dataset with three generations

>>> ds = xr.Dataset()
>>> ds["sample_id"] = "samples", ["S0", "S1", "S2", "S3", "S4"]
>>> ds["parent_id"] = ["samples", "parents"], [
...     [".", "."],
...     [".", "."],
...     ["S0", "S1"],
...     ["S0", "S1"],
...     ["S2", "."]
... ]

Select the first sample using its integer coordinate and include its children

>>> ds1 = pedigree_sel(ds, samples=0, descendant_depth=1)
>>> ds1.sample_id.values  
array(['S0', 'S2', 'S3'], dtype='<U2')
>>> ds1.parent_id.values  
array([['.', '.'],
       ['S0', '.'],
       ['S0', '.']], dtype='<U2')

Select the third sample using a boolean index and include its parents

>>> ds2 = pedigree_sel(ds, samples=[False, False, True, False, False], ancestor_depth=1)
>>> ds2.sample_id.values  
array(['S0', 'S1', 'S2'], dtype='<U2')
>>> ds2.parent_id.values  
array([['.', '.'],
       ['.', '.'],
       ['S0', 'S1']], dtype='<U2')

Select the second sample using its ‘sample_id’ and include all of its descendants

>>> ds = ds.assign_coords(dict(samples=ds.sample_id.values))
>>> ds3 = pedigree_sel(ds, samples="S1", descendant_depth=-1)
>>> ds3.sample_id.values  
array(['S1', 'S2', 'S3', 'S4'], dtype='<U2')
>>> ds3.parent_id.values  
array([['.', '.'],
       ['.', 'S1'],
       ['.', 'S1'],
       ['S2', '.']], dtype='<U2')