Python API#
Basic usage:
import bio2zarr.tskit as ts2z
root = ts2z.convert(ts_path, vcz_path, worker_processes=8)
This will convert the tskit tree sequence stored
at ts_path to VCF Zarr stored at vcz_path using 8 worker processes.
The details of how we map from the
tskit Data model to VCF Zarr are taken care of by
tskit.TreeSequence.map_to_vcf_model()
method, which is called with no
parameters by default if the model_mapping parameter to
convert() is not specified.
For more control over the properties of the output, for example
to pick a specific subset of individuals or to specify properties like
the contig ID and isolated_as_missing, you can use
map_to_vcf_model()
to return the required mapping:
model_mapping = ts.map_to_vcf_model(
individuals=[0, 1], contig_id="chr1", isolated_as_missing=True
)
root = ts2z.convert(ts, vcz_path, model_mapping=model_mapping)
API reference#
- bio2zarr.tskit.convert(ts_or_path, vcz_path=None, *, mode='r', model_mapping=None, variants_chunk_size=None, samples_chunk_size=None, worker_processes=0, show_progress=False)#
Convert a
tskit.TreeSequence(or path to a tree sequence file) to VCF Zarr format.Parameters#
- ts_or_pathtskit.TreeSequence, str, or Path
A tree sequence object or path to a tree sequence file.
- vcz_pathstr, Path, or None
Output path for the Zarr store. The output format depends on the value:
None: write to a temporary directory and return an in-memory
zarr.storage.MemoryStore-backed group.Ends with .zip: write to a directory, then package as a zip archive readable via
zarr.storage.ZipStore. The intermediate directory is removed.Otherwise: write directly to the given directory path.
- modestr
Mode in which the returned
zarr.Groupis opened. Use"r"(default) for read-only access or"r+"for read-write access.- model_mappingdict, optional
A mapping returned by
tskit.TreeSequence.map_to_vcf_model()controlling how the tree sequence data model is mapped to VCF. If None,map_to_vcf_modelis called with default parameters.- variants_chunk_sizeint, optional
Number of variants per chunk. If None, a default is used.
- samples_chunk_sizeint, optional
Number of samples per chunk. If None, a default is used.
- worker_processesint
Number of worker processes for parallel encoding. 0 (the default) means use the main process only.
- show_progressbool
If True, display a progress bar during conversion.
Returns#
- zarr.Group
The root group of the Zarr store containing the converted data.