sgkit.read_scikit_allel_vcfzarr#
- sgkit.read_scikit_allel_vcfzarr(path, field_defs=None)#
Read a VCF Zarr file created using scikit-allel.
Deprecated since version 0.9.0: Functions for reading VCF are deprecated, please use the bio2zarr package.
Loads VCF variant, sample, and genotype data as Dask arrays within a Dataset from a Zarr file created using scikit-allel’s
vcf_to_zarr
function.This allows conversion from scikit-allel’s Zarr format to sgkit’s VCF Zarr format.
Since
vcf_to_zarr
does not preserve phasing information, there is nosgkit.variables.call_genotype_phased_spec
variable in the resulting dataset.- Parameters:
- path
str
|Path
Union
[str
,Path
] Path to the Zarr file.
- field_defs {
str
: {str
:Any
}} |None
Optional
[Dict
[str
,Dict
[str
,Any
]]] (default:None
) Per-field information that overrides the field definitions in the VCF header, or provides extra information needed in the dataset representation. Definitions are a represented as a dictionary whose keys are the field names, and values are dictionaries with any of the following keys:
Number
,Type
,Description
,dimension
. The first three correspond to VCF header values, anddimension
is the name of the final dimension in the array for the case whereNumber
is a fixed integer larger than 1. For example,{"INFO/AC": {"Number": "A"}, "FORMAT/HQ": {"dimension": "haplotypes"}}
overrides theINFO/AC
field to be NumberA
(useful if the VCF defines it as having variable length with.
), and names the final dimension of theHQ
array (which is defined as Number 2 in the VCF header) ashaplotypes
. (Note that NumberA
is the number of alternate alleles, see section 1.4.2 of the VCF spec https://samtools.github.io/hts-specs/VCFv4.3.pdf.)
- path
- Return type:
- Returns:
: A dataset containing the following variables:
sgkit.variables.variant_id_spec
(variants)sgkit.variables.variant_contig_spec
(variants)sgkit.variables.variant_position_spec
(variants)sgkit.variables.variant_allele_spec
(variants)sgkit.variables.sample_id_spec
(samples)sgkit.variables.call_genotype_spec
(variants, samples, ploidy)sgkit.variables.call_genotype_mask_spec
(variants, samples, ploidy)