sgkit.window_by_variant#
- sgkit.window_by_variant(ds, *, size, step=None, variant_contig='variant_contig', merge=True)#
Add window information to a dataset, measured by number of variants.
Windows are defined over the
variantsdimension, and are used by some downstream functions to calculate statistics for each window. Windows never span contigs.- Parameters
- ds :
Dataset Genotype call dataset.
- size :
int The window size, measured by number of variants.
- step :
int|NoneOptional[int] (default:None) The distance (number of variants) between start positions of windows. Defaults to
size.- variant_contig :
Hashable(default:'variant_contig') Name of variable containing variant contig indexes. Defined by
sgkit.variables.variant_contig_spec.- merge :
bool(default:True) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds :
- Return type
- Returns
A dataset containing the following variables:
sgkit.variables.window_contig_spec(windows): The index values of window contigs.sgkit.variables.window_start_spec(windows): The index values of window start positions.sgkit.variables.window_stop_spec(windows): The index values of window stop positions.
Examples
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=10, n_sample=2, n_contig=2) >>> ds.variant_contig.values array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) >>> ds.variant_position.values array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4])
>>> # Contiguous windows, each with the same number of variants (3) >>> # except for the last window of each contig >>> sg.window_by_variant(ds, size=3, merge=False) <xarray.Dataset> Dimensions: (windows: 4) Dimensions without coordinates: windows Data variables: window_contig (windows) int64 0 0 1 1 window_start (windows) int64 0 3 5 8 window_stop (windows) int64 3 5 8 10
>>> # Overlapping windows >>> sg.window_by_variant(ds, size=3, step=2, merge=False) <xarray.Dataset> Dimensions: (windows: 6) Dimensions without coordinates: windows Data variables: window_contig (windows) int64 0 0 0 1 1 1 window_start (windows) int64 0 2 4 5 7 9 window_stop (windows) int64 3 5 5 8 10 10