Python API

Python API#

Basic usage:

import bio2zarr.vcf as v2z

v2z.convert([vcf_path], vcz_path)

To convert directly to an in-memory Zarr store (without writing to disk):

root = v2z.convert([vcf_path])

To convert to a zip archive:

root = v2z.convert([vcf_path], "output.vcz.zip")

API reference#

bio2zarr.vcf.convert(vcfs, vcz_path=None, *, mode='r', variants_chunk_size=None, samples_chunk_size=None, worker_processes=0, local_alleles=None, show_progress=False, icf_path=None)#

Convert VCF file(s) to VCF Zarr format.

Parameters#

vcfslist of str or Path

Paths to the VCF/BCF files to convert.

vcz_pathstr, Path, or None

Output path for the Zarr store. The output format depends on the value:

  • None: write to a temporary directory and return an in-memory zarr.storage.MemoryStore-backed group.

  • Ends with .zip: write to a directory, then package as a zip archive readable via zarr.storage.ZipStore. The intermediate directory is removed.

  • Otherwise: write directly to the given directory path.

modestr

Mode in which the returned zarr.Group is opened. Use "r" (default) for read-only access or "r+" for read-write access.

variants_chunk_sizeint, optional

Number of variants per chunk. If None, a default is used.

samples_chunk_sizeint, optional

Number of samples per chunk. If None, a default is used.

worker_processesint

Number of worker processes for parallel encoding. 0 (the default) means use the main process only.

local_allelesint, optional

Maximum number of local alleles for the LAA encoding. If None, standard allele encoding is used.

show_progressbool

If True, display a progress bar during conversion.

icf_pathstr, Path, or None

Path for the intermediate columnar format (ICF) data. If None, a temporary directory is used and cleaned up automatically.

Returns#

zarr.Group

The root group of the Zarr store containing the converted data.