Python API#
Basic usage:
import bio2zarr.vcf as v2z
v2z.convert([vcf_path], vcz_path)
To convert directly to an in-memory Zarr store (without writing to disk):
root = v2z.convert([vcf_path])
To convert to a zip archive:
root = v2z.convert([vcf_path], "output.vcz.zip")
API reference#
- bio2zarr.vcf.convert(vcfs, vcz_path=None, *, mode='r', variants_chunk_size=None, samples_chunk_size=None, worker_processes=0, local_alleles=None, show_progress=False, icf_path=None)#
Convert VCF file(s) to VCF Zarr format.
Parameters#
- vcfslist of str or Path
Paths to the VCF/BCF files to convert.
- vcz_pathstr, Path, or None
Output path for the Zarr store. The output format depends on the value:
None: write to a temporary directory and return an in-memory
zarr.storage.MemoryStore-backed group.Ends with .zip: write to a directory, then package as a zip archive readable via
zarr.storage.ZipStore. The intermediate directory is removed.Otherwise: write directly to the given directory path.
- modestr
Mode in which the returned
zarr.Groupis opened. Use"r"(default) for read-only access or"r+"for read-write access.- variants_chunk_sizeint, optional
Number of variants per chunk. If None, a default is used.
- samples_chunk_sizeint, optional
Number of samples per chunk. If None, a default is used.
- worker_processesint
Number of worker processes for parallel encoding. 0 (the default) means use the main process only.
- local_allelesint, optional
Maximum number of local alleles for the LAA encoding. If None, standard allele encoding is used.
- show_progressbool
If True, display a progress bar during conversion.
- icf_pathstr, Path, or None
Path for the intermediate columnar format (ICF) data. If None, a temporary directory is used and cleaned up automatically.
Returns#
- zarr.Group
The root group of the Zarr store containing the converted data.