API reference¶
This page provides an auto-generated summary of sgkits’s API.
IO/imports¶
PLINK¶
|
Read PLINK dataset. |
VCF¶
|
Calculate genomic region strings to partition a compressed VCF or BCF file into roughly equal parts. |
|
Convert VCF files to a single Zarr on-disk store. |
|
Convert VCF files to multiple Zarr on-disk stores, one per region. |
|
Combine multiple Zarr stores into a single Xarray dataset. |
|
Read a VCF Zarr file created using scikit-allel. |
Methods¶
|
Compute per sample allele counts from genotype calls. |
|
Compute per cohort allele counts from per-sample allele counts, or genotype calls. |
|
Compute allele count from per-sample allele counts, or genotype calls. |
|
Compute divergence between pairs of cohorts. |
|
Compute diversity from cohort allele counts. |
|
Compute Fst between pairs of cohorts. |
|
Compute the H1, H12, H123 and H2/H1 statistics for detecting signatures of soft sweeps, as defined in Garud et al. |
|
Run linear regression to identify continuous trait associations with genetic variants. |
|
Exact test for HWE as described in Wigginton et al. |
|
Regenie trait transformation. |
|
Compute quality control variant statistics from genotype calls. |
|
Compute Tajimas’ D for a genotype call dataset. |
|
Compute PC-Relate as described in Conomos, et al. |
Utilities¶
|
Display genotype calls. |
|
Simulate genotype calls and variant/sample data. |
|
Add fixed-size windowing information to a dataset. |
Variables¶
REGENIE’s base prediction (blocks, alphas, samples, outcomes). |
|
Allele counts. |
|
Dosages, encoded as floats, with NaN indicating a missing value. |
|
TODO |
|
Call genotype. |
|
TODO |
|
A flag for each call indicating if it is phased or not. |
|
TODO |
|
TODO |
|
Covariate variable names. |
|
Dosage variable name. |
|
Genotype counts. |
|
REGENIE’s loco_prediction (contigs, samples, outcomes). |
|
REGENIE’s meta_prediction (samples, outcomes). |
|
PC Relate kinship coefficient matrix. |
|
The unique identifier of the sample. |
|
Sample PCs (PCxS). |
|
Garud H1 statistic for cohorts. |
|
Garud H12 statistic for cohorts. |
|
Garud H123 statistic for cohorts. |
|
Garud H2/H1 statistic for cohorts. |
|
Trait (for example phenotype) variable names. |
|
The possible alleles for the variant. |
|
Variant allele counts. |
|
The frequency of the occurrence of each allele. |
|
The number of occurrences of all alleles. |
|
Beta values associated with each variant and trait. |
|
The number of samples with heterozygous calls. |
|
Index corresponding to contig name for each variant. |
|
P values from HWE test for each variant as float in [0, 1]. |
|
The unique identifier of the variant. |
|
The number of samples with called genotypes. |
|
The number of samples with heterozygous calls. |
|
The number of samples with homozygous alternate calls. |
|
The number of samples with homozygous reference calls. |
|
The number of samples that are not homozygous reference calls. |
|
P values as float in [0, 1]. |
|
The reference position of the variant. |
|
T statistics for each beta. |
|
The index values of window start positions along the |
|
The index values of window stop positions along the |