sgkit.window_by_variant#
- sgkit.window_by_variant(ds, *, size, step=None, variant_contig='variant_contig', merge=True)#
Add window information to a dataset, measured by number of variants.
Windows are defined over the
variants
dimension, and are used by some downstream functions to calculate statistics for each window. Windows never span contigs.- Parameters:
- ds
Dataset
Genotype call dataset.
- size
int
The window size, measured by number of variants.
- step
int
|None
Optional
[int
] (default:None
) The distance (number of variants) between start positions of windows. Defaults to
size
.- variant_contig
Hashable
(default:'variant_contig'
) Name of variable containing variant contig indexes. Defined by
sgkit.variables.variant_contig_spec
.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing the following variables:
sgkit.variables.window_contig_spec
(windows): The index values of window contigs.sgkit.variables.window_start_spec
(windows): The index values of window start positions.sgkit.variables.window_stop_spec
(windows): The index values of window stop positions.
Examples
>>> import sgkit as sg >>> ds = sg.simulate_genotype_call_dataset(n_variant=10, n_sample=2, n_contig=2) >>> ds.variant_contig.values array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1]) >>> ds.variant_position.values array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4])
>>> # Contiguous windows, each with the same number of variants (3) >>> # except for the last window of each contig >>> sg.window_by_variant(ds, size=3, merge=False) <xarray.Dataset> Size: 96B Dimensions: (windows: 4) Dimensions without coordinates: windows Data variables: window_contig (windows) int64 32B 0 0 1 1 window_start (windows) int64 32B 0 3 5 8 window_stop (windows) int64 32B 3 5 8 10
>>> # Overlapping windows >>> sg.window_by_variant(ds, size=3, step=2, merge=False) <xarray.Dataset> Size: 144B Dimensions: (windows: 6) Dimensions without coordinates: windows Data variables: window_contig (windows) int64 48B 0 0 0 1 1 1 window_start (windows) int64 48B 0 2 4 5 7 9 window_stop (windows) int64 48B 3 5 5 8 10 10