CLI Reference#
vcf2zarr convert#
Convert input VCF(s) directly to vcfzarr (not recommended for large files).
vcf2zarr convert [OPTIONS] VCFS... ZARR_PATH
Options
- -f, --force#
Force overwriting of existing directories
- -l, --variants-chunk-size <variants_chunk_size>#
Chunk size in the variants dimension
- -w, --samples-chunk-size <samples_chunk_size>#
Chunk size in the samples dimension
- -v, --verbose#
Increase verbosity
- -P, --progress, -Q, --no-progress#
Show progress bars (default: show)
- -p, --worker-processes <worker_processes>#
Number of worker processes
- --local-alleles, --no-local-alleles#
Use local allele fields to reduce the storage requirements of the output.
- Default:
False
Arguments
- VCFS#
Required argument(s)
- ZARR_PATH#
Required argument
vcf2zarr inspect#
Inspect an intermediate columnar format or Zarr path.
vcf2zarr inspect [OPTIONS] PATH
Options
- -v, --verbose#
Increase verbosity
Arguments
- PATH#
Required argument
vcf2zarr mkschema#
Generate a schema for zarr encoding
vcf2zarr mkschema [OPTIONS] ICF_PATH
Options
- -l, --variants-chunk-size <variants_chunk_size>#
Chunk size in the variants dimension
- -w, --samples-chunk-size <samples_chunk_size>#
Chunk size in the samples dimension
- --local-alleles, --no-local-alleles#
Use local allele fields to reduce the storage requirements of the output.
- Default:
False
Arguments
- ICF_PATH#
Required argument
Explode#
vcf2zarr explode#
Convert VCF(s) to intermediate columnar format
vcf2zarr explode [OPTIONS] VCFS... ICF_PATH
Options
- -f, --force#
Force overwriting of existing directories
- -v, --verbose#
Increase verbosity
- -c, --column-chunk-size <column_chunk_size>#
Approximate uncompressed size of exploded column chunks in MiB
- -C, --compressor <compressor>#
Codec to use for compressing column chunks (Default=zstd).
- Options:
lz4 | zstd
- -P, --progress, -Q, --no-progress#
Show progress bars (default: show)
- -p, --worker-processes <worker_processes>#
Number of worker processes
Arguments
- VCFS#
Required argument(s)
- ICF_PATH#
Required argument
vcf2zarr dexplode-init#
Initial step for distributed conversion of VCF(s) to intermediate columnar format over some number of paritions.
vcf2zarr dexplode-init [OPTIONS] VCFS... ICF_PATH
Options
- -n, --num-partitions <num_partitions>#
Target number of partitions to split into
- -f, --force#
Force overwriting of existing directories
- -c, --column-chunk-size <column_chunk_size>#
Approximate uncompressed size of exploded column chunks in MiB
- -C, --compressor <compressor>#
Codec to use for compressing column chunks (Default=zstd).
- Options:
lz4 | zstd
- --json#
Output summary data in JSON format
- -v, --verbose#
Increase verbosity
- -P, --progress, -Q, --no-progress#
Show progress bars (default: show)
- -p, --worker-processes <worker_processes>#
Number of worker processes
Arguments
- VCFS#
Required argument(s)
- ICF_PATH#
Required argument
vcf2zarr dexplode-partition#
Convert a VCF partition to intermediate columnar format. Must be called after the ICF path has been initialised with dexplode_init. By default, partition indexes are from 0 to the number of partitions N (returned by dexplode_init), exclusive. If the –one-based option is specifed, partition indexes are in the range 1 to N, inclusive.
vcf2zarr dexplode-partition [OPTIONS] ICF_PATH PARTITION
Options
- -v, --verbose#
Increase verbosity
- --one-based#
Partition indexes are interpreted as one-based
Arguments
- ICF_PATH#
Required argument
- PARTITION#
Required argument
vcf2zarr dexplode-finalise#
Final step for distributed conversion of VCF(s) to intermediate columnar format.
vcf2zarr dexplode-finalise [OPTIONS] ICF_PATH
Options
- -v, --verbose#
Increase verbosity
Arguments
- ICF_PATH#
Required argument
Encode#
vcf2zarr encode#
Convert intermediate columnar format to vcfzarr.
vcf2zarr encode [OPTIONS] ICF_PATH ZARR_PATH
Options
- -f, --force#
Force overwriting of existing directories
- -v, --verbose#
Increase verbosity
- -s, --schema <schema>#
- -l, --variants-chunk-size <variants_chunk_size>#
Chunk size in the variants dimension
- -w, --samples-chunk-size <samples_chunk_size>#
Chunk size in the samples dimension
- -V, --max-variant-chunks <max_variant_chunks>#
Truncate the output in the variants dimension to have this number of chunks. Mainly intended to help with schema tuning.
- -M, --max-memory <max_memory>#
An approximate bound on overall memory usage (e.g. 10G),
- -P, --progress, -Q, --no-progress#
Show progress bars (default: show)
- -p, --worker-processes <worker_processes>#
Number of worker processes
Arguments
- ICF_PATH#
Required argument
- ZARR_PATH#
Required argument
vcf2zarr dencode-init#
Initialise conversion of intermediate format to VCF Zarr. This will set up the specified ZARR_PATH to perform this conversion over some number of partitions.
The output of this commmand is the actual number of partitions generated (which may be less then the requested number, if there is not sufficient chunks in the variants dimension) and a rough lower-bound on the amount of memory required to encode a partition.
NOTE: the format of this output will likely change in subsequent releases; it should not be considered machine-readable for now.
vcf2zarr dencode-init [OPTIONS] ICF_PATH ZARR_PATH
Options
- -n, --num-partitions <num_partitions>#
Target number of partitions to split into
- -f, --force#
Force overwriting of existing directories
- -s, --schema <schema>#
- -l, --variants-chunk-size <variants_chunk_size>#
Chunk size in the variants dimension
- -w, --samples-chunk-size <samples_chunk_size>#
Chunk size in the samples dimension
- -V, --max-variant-chunks <max_variant_chunks>#
Truncate the output in the variants dimension to have this number of chunks. Mainly intended to help with schema tuning.
- --json#
Output summary data in JSON format
- -P, --progress, -Q, --no-progress#
Show progress bars (default: show)
- -v, --verbose#
Increase verbosity
Arguments
- ICF_PATH#
Required argument
- ZARR_PATH#
Required argument
vcf2zarr dencode-partition#
Convert a partition from intermediate columnar format to VCF Zarr. Must be called after the Zarr path has been initialised with dencode_init. By default, partition indexes are from 0 to the number of partitions N (returned by dencode_init), exclusive. If the –one-based option is specifed, partition indexes are in the range 1 to N, inclusive.
vcf2zarr dencode-partition [OPTIONS] ZARR_PATH PARTITION
Options
- -v, --verbose#
Increase verbosity
- --one-based#
Partition indexes are interpreted as one-based
Arguments
- ZARR_PATH#
Required argument
- PARTITION#
Required argument
vcf2zarr dencode-finalise#
Final step for distributed conversion of ICF to VCF Zarr.
vcf2zarr dencode-finalise [OPTIONS] ZARR_PATH
Options
- -v, --verbose#
Increase verbosity
- -P, --progress, -Q, --no-progress#
Show progress bars (default: show)
Arguments
- ZARR_PATH#
Required argument