sgkit: Statistical genetics toolkit in Python#
Sgkit is a Python package that provides a variety of analytical genetics methods through the use of general-purpose frameworks such as Xarray, Pandas, Dask and Zarr. The sgkit API makes as few assumptions as possible about the origin, structure, and intended use of genetic data by adopting a set of domain-specific conventions that allow such data to be used within this broader ecosystem of tools. The package is designed for complex workflows over large distributed datasets but attempts to make it as easy as possible to scale down to smaller datasets and access simpler functionality for those that may be new to Python (though there is still a good bit of work to done on this front). See Getting Started for more details.
Sgkit is inspired heavily by scikit-allel and Hail, both popular Python genetics toolkits with a respective focus on population and quantitative genetics.
- Getting Started
- User Guide
- Examples
- API reference
- How do I …
- Create a test dataset?
- Look at the dataset summary?
- Get the values for a variable in a dataset?
- Find the definition for a variable in a dataset?
- Look at the genotypes?
- Subset the variables?
- Subset to a genomic range?
- Get the list of samples?
- Subset the samples?
- Define a new variable based on others?
- Get summary stats?
- Filter variants?
- Find which new variables were added by a method?
- Save results to a Zarr file?
- Load a dataset from Zarr?
- Contributing to sgkit
- About
- News
- Changelog