Python API

Python API#

Basic usage:

import bio2zarr.plink as p2z

root = p2z.convert(plink_prefix, vcz_path)

This will convert the PLINK fileset with the given path prefix (i.e. the shared prefix of the .bed, .bim, and .fam files) to VCF Zarr stored at vcz_path.

API reference#

bio2zarr.plink.convert(prefix, out=None, *, mode='r', variants_chunk_size=None, samples_chunk_size=None, worker_processes=0, show_progress=False)#

Convert a PLINK fileset to VCF Zarr format.

Parameters#

prefixstr or Path

Path prefix for the PLINK fileset (i.e. the shared prefix of the .bed, .bim, and .fam files).

outstr, Path, or None

Output path for the Zarr store. The output format depends on the value:

  • None: write to a temporary directory and return an in-memory zarr.storage.MemoryStore-backed group.

  • Ends with .zip: write to a directory, then package as a zip archive readable via zarr.storage.ZipStore. The intermediate directory is removed.

  • Otherwise: write directly to the given directory path.

modestr

Mode in which the returned zarr.Group is opened. Use "r" (default) for read-only access or "r+" for read-write access.

variants_chunk_sizeint, optional

Number of variants per chunk. If None, a default is used.

samples_chunk_sizeint, optional

Number of samples per chunk. If None, a default is used.

worker_processesint

Number of worker processes for parallel encoding. 0 (the default) means use the main process only.

show_progressbool

If True, display a progress bar during conversion.

Returns#

zarr.Group

The root group of the Zarr store containing the converted data.