Python API

Python API#

Basic usage:

import bio2zarr.tskit as ts2z

root = ts2z.convert(ts_path, vcz_path, worker_processes=8)

This will convert the tskit tree sequence stored at ts_path to VCF Zarr stored at vcz_path using 8 worker processes. The details of how we map from the tskit Data model to VCF Zarr are taken care of by tskit.TreeSequence.map_to_vcf_model() method, which is called with no parameters by default if the model_mapping parameter to convert() is not specified.

For more control over the properties of the output, for example to pick a specific subset of individuals or to specify properties like the contig ID and isolated_as_missing, you can use map_to_vcf_model() to return the required mapping:

model_mapping = ts.map_to_vcf_model(
    individuals=[0, 1], contig_id="chr1", isolated_as_missing=True
)
root = ts2z.convert(ts, vcz_path, model_mapping=model_mapping)

API reference#

bio2zarr.tskit.convert(ts_or_path, vcz_path=None, *, mode='r', model_mapping=None, variants_chunk_size=None, samples_chunk_size=None, worker_processes=0, show_progress=False)#

Convert a tskit.TreeSequence (or path to a tree sequence file) to VCF Zarr format.

Parameters#

ts_or_pathtskit.TreeSequence, str, or Path

A tree sequence object or path to a tree sequence file.

vcz_pathstr, Path, or None

Output path for the Zarr store. The output format depends on the value:

  • None: write to a temporary directory and return an in-memory zarr.storage.MemoryStore-backed group.

  • Ends with .zip: write to a directory, then package as a zip archive readable via zarr.storage.ZipStore. The intermediate directory is removed.

  • Otherwise: write directly to the given directory path.

modestr

Mode in which the returned zarr.Group is opened. Use "r" (default) for read-only access or "r+" for read-write access.

model_mappingdict, optional

A mapping returned by tskit.TreeSequence.map_to_vcf_model() controlling how the tree sequence data model is mapped to VCF. If None, map_to_vcf_model is called with default parameters.

variants_chunk_sizeint, optional

Number of variants per chunk. If None, a default is used.

samples_chunk_sizeint, optional

Number of samples per chunk. If None, a default is used.

worker_processesint

Number of worker processes for parallel encoding. 0 (the default) means use the main process only.

show_progressbool

If True, display a progress bar during conversion.

Returns#

zarr.Group

The root group of the Zarr store containing the converted data.