sgkit.io.vcf.zarrs_to_dataset

sgkit.io.vcf.zarrs_to_dataset(urls, chunk_length=10000, chunk_width=1000, storage_options=None)

Combine multiple Zarr stores into a single Xarray dataset.

The Zarr stores are concatenated and rechunked to produce a single combined dataset.

Parameters
urls : Sequence[str]Sequence[str]

A list of URLs to the Zarr stores to combine, typically the return value of vcf_to_zarrs().

chunk_length : intint (default: 10000)

Length (number of variants) of chunks in which data are stored, by default 10,000.

chunk_width : intint (default: 1000)

Width (number of samples) to use when storing chunks in output, by default 1,000.

storage_options : {str: str} | NoneOptional[Dict[str, str]] (default: None)

Any additional parameters for the storage backend (see fsspec.open).

Return type

DatasetDataset

Returns

A dataset representing the combined dataset.