sgkit.io.vcf.zarr_array_sizes#
- sgkit.io.vcf.zarr_array_sizes(input, *, regions=None, target_part_size='auto')#
Make a pass through a VCF/BCF file to determine sizes for storage in Zarr.
By default, the input is processed in parts in parallel. However, if the input is a single file,
target_part_sizeis None, andregionsis None, then the operation will be carried out sequentially.- Parameters
- input
str|Path|Sequence[Union[str,Path]]Union[str,Path,Sequence[Union[str,Path]]] A path (or paths) to the input BCF or VCF file (or files). VCF files should be compressed and have a
.tbior.csiindex file. BCF files should have a.csiindex file.- target_part_size
None|int|strUnion[None,int,str] (default:'auto') The desired size, in bytes, of each (compressed) part of the input to be processed in parallel. Defaults to
"auto", which will pick a good size (currently 20MB). A value of None means that the input will be processed sequentially. The setting will be ignored ifregionsis also specified.- regions
None|Sequence[str] |Sequence[Optional[Sequence[str]]]Union[None,Sequence[str],Sequence[Optional[Sequence[str]]]] (default:None) Genomic region or regions to extract variants for. For multiple inputs, multiple input regions are specified as a sequence of values which may be None, or a sequence of region strings. Takes priority over
target_part_sizeif both are not None.
- input
- Return type