CLI Reference#

vcf2zarr convert#

Convert input VCF(s) directly to vcfzarr (not recommended for large files).

vcf2zarr convert [OPTIONS] VCFS... ZARR_PATH

Options

-f, --force#

Force overwriting of existing directories

-l, --variants-chunk-size <variants_chunk_size>#

Chunk size in the variants dimension

-w, --samples-chunk-size <samples_chunk_size>#

Chunk size in the samples dimension

-v, --verbose#

Increase verbosity

-P, --progress, -Q, --no-progress#

Show progress bars (default: show)

-p, --worker-processes <worker_processes>#

Number of worker processes

--local-alleles, --no-local-alleles#

Use local allele fields to reduce the storage requirements of the output.

Default:

False

Arguments

VCFS#

Required argument(s)

ZARR_PATH#

Required argument

vcf2zarr inspect#

Inspect an intermediate columnar format or Zarr path.

vcf2zarr inspect [OPTIONS] PATH

Options

-v, --verbose#

Increase verbosity

Arguments

PATH#

Required argument

vcf2zarr mkschema#

Generate a schema for zarr encoding

vcf2zarr mkschema [OPTIONS] ICF_PATH

Arguments

ICF_PATH#

Required argument

Explode#

vcf2zarr explode#

Convert VCF(s) to intermediate columnar format

vcf2zarr explode [OPTIONS] VCFS... ICF_PATH

Options

-f, --force#

Force overwriting of existing directories

-v, --verbose#

Increase verbosity

-c, --column-chunk-size <column_chunk_size>#

Approximate uncompressed size of exploded column chunks in MiB

-C, --compressor <compressor>#

Codec to use for compressing column chunks (Default=zstd).

Options:

lz4 | zstd

-P, --progress, -Q, --no-progress#

Show progress bars (default: show)

-p, --worker-processes <worker_processes>#

Number of worker processes

--local-alleles, --no-local-alleles#

Use local allele fields to reduce the storage requirements of the output.

Default:

False

Arguments

VCFS#

Required argument(s)

ICF_PATH#

Required argument

vcf2zarr dexplode-init#

Initial step for distributed conversion of VCF(s) to intermediate columnar format over some number of paritions.

vcf2zarr dexplode-init [OPTIONS] VCFS... ICF_PATH

Options

-n, --num-partitions <num_partitions>#

Target number of partitions to split into

-f, --force#

Force overwriting of existing directories

-c, --column-chunk-size <column_chunk_size>#

Approximate uncompressed size of exploded column chunks in MiB

-C, --compressor <compressor>#

Codec to use for compressing column chunks (Default=zstd).

Options:

lz4 | zstd

--json#

Output summary data in JSON format

-v, --verbose#

Increase verbosity

-P, --progress, -Q, --no-progress#

Show progress bars (default: show)

-p, --worker-processes <worker_processes>#

Number of worker processes

--local-alleles, --no-local-alleles#

Use local allele fields to reduce the storage requirements of the output.

Default:

False

Arguments

VCFS#

Required argument(s)

ICF_PATH#

Required argument

vcf2zarr dexplode-partition#

Convert a VCF partition to intermediate columnar format. Must be called after the ICF path has been initialised with dexplode_init. By default, partition indexes are from 0 to the number of partitions N (returned by dexplode_init), exclusive. If the –one-based option is specifed, partition indexes are in the range 1 to N, inclusive.

vcf2zarr dexplode-partition [OPTIONS] ICF_PATH PARTITION

Options

-v, --verbose#

Increase verbosity

--one-based#

Partition indexes are interpreted as one-based

Arguments

ICF_PATH#

Required argument

PARTITION#

Required argument

vcf2zarr dexplode-finalise#

Final step for distributed conversion of VCF(s) to intermediate columnar format.

vcf2zarr dexplode-finalise [OPTIONS] ICF_PATH

Options

-v, --verbose#

Increase verbosity

Arguments

ICF_PATH#

Required argument

Encode#

vcf2zarr encode#

Convert intermediate columnar format to vcfzarr.

vcf2zarr encode [OPTIONS] ICF_PATH ZARR_PATH

Options

-f, --force#

Force overwriting of existing directories

-v, --verbose#

Increase verbosity

-s, --schema <schema>#
-l, --variants-chunk-size <variants_chunk_size>#

Chunk size in the variants dimension

-w, --samples-chunk-size <samples_chunk_size>#

Chunk size in the samples dimension

-V, --max-variant-chunks <max_variant_chunks>#

Truncate the output in the variants dimension to have this number of chunks. Mainly intended to help with schema tuning.

-M, --max-memory <max_memory>#

An approximate bound on overall memory usage (e.g. 10G),

-P, --progress, -Q, --no-progress#

Show progress bars (default: show)

-p, --worker-processes <worker_processes>#

Number of worker processes

Arguments

ICF_PATH#

Required argument

ZARR_PATH#

Required argument

vcf2zarr dencode-init#

Initialise conversion of intermediate format to VCF Zarr. This will set up the specified ZARR_PATH to perform this conversion over some number of partitions.

The output of this commmand is the actual number of partitions generated (which may be less then the requested number, if there is not sufficient chunks in the variants dimension) and a rough lower-bound on the amount of memory required to encode a partition.

NOTE: the format of this output will likely change in subsequent releases; it should not be considered machine-readable for now.

vcf2zarr dencode-init [OPTIONS] ICF_PATH ZARR_PATH

Options

-n, --num-partitions <num_partitions>#

Target number of partitions to split into

-f, --force#

Force overwriting of existing directories

-s, --schema <schema>#
-l, --variants-chunk-size <variants_chunk_size>#

Chunk size in the variants dimension

-w, --samples-chunk-size <samples_chunk_size>#

Chunk size in the samples dimension

-V, --max-variant-chunks <max_variant_chunks>#

Truncate the output in the variants dimension to have this number of chunks. Mainly intended to help with schema tuning.

--json#

Output summary data in JSON format

-P, --progress, -Q, --no-progress#

Show progress bars (default: show)

-v, --verbose#

Increase verbosity

Arguments

ICF_PATH#

Required argument

ZARR_PATH#

Required argument

vcf2zarr dencode-partition#

Convert a partition from intermediate columnar format to VCF Zarr. Must be called after the Zarr path has been initialised with dencode_init. By default, partition indexes are from 0 to the number of partitions N (returned by dencode_init), exclusive. If the –one-based option is specifed, partition indexes are in the range 1 to N, inclusive.

vcf2zarr dencode-partition [OPTIONS] ZARR_PATH PARTITION

Options

-v, --verbose#

Increase verbosity

--one-based#

Partition indexes are interpreted as one-based

Arguments

ZARR_PATH#

Required argument

PARTITION#

Required argument

vcf2zarr dencode-finalise#

Final step for distributed conversion of ICF to VCF Zarr.

vcf2zarr dencode-finalise [OPTIONS] ZARR_PATH

Options

-v, --verbose#

Increase verbosity

-P, --progress, -Q, --no-progress#

Show progress bars (default: show)

Arguments

ZARR_PATH#

Required argument