sgkit.io.vcf.zarr_array_sizes#
- sgkit.io.vcf.zarr_array_sizes(input, *, regions=None, target_part_size='auto')#
Make a pass through a VCF/BCF file to determine sizes for storage in Zarr.
Deprecated since version 0.9.0: Functions for reading VCF are deprecated, please use the bio2zarr package.
By default, the input is processed in parts in parallel. However, if the input is a single file,
target_part_size
is None, andregions
is None, then the operation will be carried out sequentially.- Parameters:
- input
str
|Path
|Sequence
[Union
[str
,Path
]]Union
[str
,Path
,Sequence
[Union
[str
,Path
]]] A path (or paths) to the input BCF or VCF file (or files). VCF files should be compressed and have a
.tbi
or.csi
index file. BCF files should have a.csi
index file.- target_part_size
None
|int
|str
Union
[None
,int
,str
] (default:'auto'
) The desired size, in bytes, of each (compressed) part of the input to be processed in parallel. Defaults to
"auto"
, which will pick a good size (currently 20MB). A value of None means that the input will be processed sequentially. The setting will be ignored ifregions
is also specified.- regions
None
|Sequence
[str
] |Sequence
[Optional
[Sequence
[str
]]]Union
[None
,Sequence
[str
],Sequence
[Optional
[Sequence
[str
]]]] (default:None
) Genomic region or regions to extract variants for. For multiple inputs, multiple input regions are specified as a sequence of values which may be None, or a sequence of region strings. Takes priority over
target_part_size
if both are not None.
- input
- Return type: