plink2zarr

plink2zarr#

Convert plink data to the VCF Zarr specification reliably in parallel.

See CLI Reference for detailed documentation on command line options.

Conversion of the plink data model to VCF follows the semantics of plink1.9 as closely as possible. That is, given a binary plink fileset with prefix “fileset” (i.e., fileset.bed, fileset.bim, fileset.fam), running

$ plink2zarr convert fileset out.vcz

should produce the same result in out.vcz as

$ plink1.9 --bfile fileset --keep-allele-order --recode vcf-iid --out tmp
$ vcf2zarr convert tmp.vcf out.vcz

Warning

It is important to note that we follow the same conventions as plink 2.0 where the A1 allele in the bim file is the VCF ALT and A2 is the REF.

Note

Currently we only convert the basic VCF-like data from plink, and don’t include phenotypes and pedigree information. These are planned as future enhancements. Please comment on this issue if you are interested in this functionality.