API reference¶
This page provides an auto-generated summary of sgkits’s API.
IO/imports¶
BGEN¶
|
Convert a BGEN file to a Zarr on-disk store. |
|
Read BGEN dataset. |
|
Rechunk BGEN dataset as Zarr. |
PLINK¶
|
Read PLINK dataset. |
VCF¶
|
Convert VCF files to a single Zarr on-disk store. |
For more low-level control:
|
Calculate genomic region strings to partition a compressed VCF or BCF file into roughly equal parts. |
|
Convert VCF files to multiple Zarr on-disk stores, one per region. |
|
Combine multiple Zarr stores into a single Xarray dataset. |
For converting from scikit-allel’s VCF Zarr representation to sgkit’s Zarr representation:
|
Read a VCF Zarr file created using scikit-allel. |
Dataset¶
|
Load a dataset from Zarr storage. |
|
Save a dataset to Zarr storage. |
Methods¶
|
Compute per sample allele counts from genotype calls. |
|
Compute per cohort allele counts from per-sample allele counts, or genotype calls. |
|
Compute allele count from per-sample allele counts, or genotype calls. |
|
Compute divergence between pairs of cohorts. |
|
Compute diversity from cohort allele counts. |
|
Compute Fst between pairs of cohorts. |
|
Compute the H1, H12, H123 and H2/H1 statistics for detecting signatures of soft sweeps, as defined in Garud et al. (2015). |
|
Run linear regression to identify continuous trait associations with genetic variants. |
|
Exact test for HWE as described in Wigginton et al. 2005 [1]. |
|
Compute PC-Relate as described in Conomos, et al. 2016 [1]. |
|
Regenie trait transformation. |
|
Compute quality control sample statistics from genotype calls. |
|
Compute Tajimas’ D for a genotype call dataset. |
|
Compute quality control variant statistics from genotype calls. |
Utilities¶
|
Convert genotype probabilities to hard calls. |
|
Display genotype calls. |
|
Replace partial genotype calls with missing values. |
|
Simulate genotype calls and variant/sample data. |
|
Add fixed-size windowing information to a dataset. |
Variables¶
REGENIE’s base prediction (blocks, alphas, samples, outcomes). |
|
Allele counts. |
|
Dosages, encoded as floats, with NaN indicating a missing value. |
|
A flag for each call indicating which values are missing. |
|
Call genotypes in which partial genotype calls are replaced with completely missing genotype calls. |
|
A flag for each call indicating which values are missing. |
|
Call genotype. |
|
A flag for each call indicating which values are missing. |
|
A flag for each call indicating if it is phased or not. |
|
Genotype probabilities. |
|
A flag for each call indicating which values are missing. |
|
Cohort allele counts. |
|
Covariate variable names. |
|
Dosage variable name. |
|
Genotype counts. |
|
REGENIE’s loco_prediction (contigs, samples, outcomes). |
|
REGENIE’s meta_prediction (samples, outcomes). |
|
PC Relate kinship coefficient matrix. |
|
The fraction of variants with called genotypes. |
|
The unique identifier of the sample. |
|
The number of variants with called genotypes. |
|
The number of variants with heterozygous calls. |
|
The number of variants with homozygous alternate calls. |
|
The number of variants with homozygous reference calls. |
|
The number of variants that are not homozygous reference calls. |
|
Sample PCs (PCxS). |
|
Genetic divergence between pairs of cohorts. |
|
Genetic diversity (also known as “Tajima’s pi”) for cohorts. |
|
Fixation index (Fst) between pairs of cohorts. |
|
Garud H1 statistic for cohorts. |
|
Garud H12 statistic for cohorts. |
|
Garud H123 statistic for cohorts. |
|
Garud H2/H1 statistic for cohorts. |
|
Tajima’s D for cohorts. |
|
Trait (for example phenotype) variable names. |
|
The possible alleles for the variant. |
|
Variant allele counts. |
|
The frequency of the occurrence of each allele. |
|
The number of occurrences of all alleles. |
|
Beta values associated with each variant and trait. |
|
The fraction of samples with called genotypes. |
|
Index corresponding to contig name for each variant. |
|
P values from HWE test for each variant as float in [0, 1]. |
|
The unique identifier of the variant. |
|
The number of samples with called genotypes. |
|
The number of samples with heterozygous calls. |
|
The number of samples with homozygous alternate calls. |
|
The number of samples with homozygous reference calls. |
|
The number of samples that are not homozygous reference calls. |
|
P values as float in [0, 1]. |
|
The reference position of the variant. |
|
T statistics for each beta. |
|
The index values of window start positions along the |
|
The index values of window stop positions along the |