Introducing sgkit#

The sgkit team is pleased to announce the release of sgkit 0.5.0! This release adds support for the VCF Zarr specification, which describes an encoding of VCF data in chunked-columnar form using the Zarr format.

With this release, we also introduce our news page, where we will announce future releases and provide other relevant updates for the sgkit project.

Oxford and Related Sciences began collaborating in early 2020 on sgkit as a successor to the popular scikit-allel library. We’ve worked closely with third-party library authors to read and write data stored in VCF (cyvcf2), BGEN (cbgen), and PLINK (bed_reader) files. We’ve designed an Xarray-based data model and implemented many common methods from statistical and population genetics, including variant and sample quality control, kinship analysis, genome-wide selection scans, and genome-wide association analyses, as well as a novel implementation of the recently developed REGENIE algorithm.

sgkit was accepted as a NumFOCUS Sponsored Project in 2021, and we now have developers in the US, the UK, and New Zealand.

If you think sgkit might be useful for your project, please don’t hesitate to file an issue or start a discussion with questions and feedback!