API reference

This page provides an auto-generated summary of sgkits’s API.

IO/imports

VCF

partition_into_regions(vcf_path, *[, …])

Calculate genomic region strings to partition a compressed VCF or BCF file into roughly equal parts.

vcf_to_zarr(input, output, *[, regions, …])

Convert VCF files to a single Zarr on-disk store.

vcf_to_zarrs(input, output, regions[, …])

Convert VCF files to multiple Zarr on-disk stores, one per region.

zarrs_to_dataset(urls[, chunk_length, …])

Combine multiple Zarr stores into a single Xarray dataset.

read_vcfzarr(path)

Read a VCF Zarr file created using scikit-allel.

Methods

count_call_alleles(ds, *[, call_genotype, merge])

Compute per sample allele counts from genotype calls.

count_cohort_alleles(ds, *[, …])

Compute per cohort allele counts from per-sample allele counts, or genotype calls.

count_variant_alleles(ds, *[, …])

Compute allele count from per-sample allele counts, or genotype calls.

divergence(ds, *[, cohort_allele_count, merge])

Compute divergence between pairs of cohorts.

diversity(ds, *[, cohort_allele_count, merge])

Compute diversity from cohort allele counts.

Fst(ds, *[, estimator, stat_divergence, merge])

Compute Fst between pairs of cohorts.

Garud_h(ds, *[, call_genotype, merge])

Compute the H1, H12, H123 and H2/H1 statistics for detecting signatures of soft sweeps, as defined in Garud et al.

gwas_linear_regression(ds, *, dosage, …[, …])

Run linear regression to identify continuous trait associations with genetic variants.

hardy_weinberg_test(ds, *[, …])

Exact test for HWE as described in Wigginton et al.

regenie(ds, *, dosage, covariates, traits[, …])

Regenie trait transformation.

variant_stats(ds, *[, call_genotype_mask, …])

Compute quality control variant statistics from genotype calls.

Tajimas_D(ds, *[, variant_allele_count, …])

Compute Tajimas’ D for a genotype call dataset.

pc_relate(ds, *[, maf, call_genotype, …])

Compute PC-Relate as described in Conomos, et al.

Utilities

display_genotypes(ds[, max_variants, …])

Display genotype calls.

simulate_genotype_call_dataset(n_variant, …)

Simulate genotype calls and variant/sample data.

window(ds, size, step[, merge])

Add fixed-size windowing information to a dataset.

Variables

variables.base_prediction_spec

REGENIE’s base prediction (blocks, alphas, samples, outcomes).

variables.call_allele_count_spec

Allele counts.

variables.call_dosage_spec

Dosages, encoded as floats, with NaN indicating a missing value.

variables.call_dosage_mask_spec

TODO

variables.call_genotype_spec

Call genotype.

variables.call_genotype_mask_spec

TODO

variables.call_genotype_phased_spec

A flag for each call indicating if it is phased or not.

variables.call_genotype_probability_spec

TODO

variables.call_genotype_probability_mask_spec

TODO

variables.covariates_spec

Covariate variable names.

variables.dosage_spec

Dosage variable name.

variables.genotype_counts_spec

Genotype counts.

variables.loco_prediction_spec

REGENIE’s loco_prediction (contigs, samples, outcomes).

variables.meta_prediction_spec

REGENIE’s meta_prediction (samples, outcomes).

variables.pc_relate_phi_spec

PC Relate kinship coefficient matrix.

variables.sample_id_spec

The unique identifier of the sample.

variables.sample_pcs_spec

Sample PCs (PCxS).

variables.stat_Garud_h1_spec

Garud H1 statistic for cohorts.

variables.stat_Garud_h12_spec

Garud H12 statistic for cohorts.

variables.stat_Garud_h123_spec

Garud H123 statistic for cohorts.

variables.stat_Garud_h2_h1_spec

Garud H2/H1 statistic for cohorts.

variables.traits_spec

Trait (for example phenotype) variable names.

variables.variant_allele_spec

The possible alleles for the variant.

variables.variant_allele_count_spec

Variant allele counts.

variables.variant_allele_frequency_spec

The frequency of the occurrence of each allele.

variables.variant_allele_total_spec

The number of occurrences of all alleles.

variables.variant_beta_spec

Beta values associated with each variant and trait.

variables.variant_call_rate_spec

The number of samples with heterozygous calls.

variables.variant_contig_spec

Index corresponding to contig name for each variant.

variables.variant_hwe_p_value_spec

P values from HWE test for each variant as float in [0, 1].

variables.variant_id_spec

The unique identifier of the variant.

variables.variant_n_called_spec

The number of samples with called genotypes.

variables.variant_n_het_spec

The number of samples with heterozygous calls.

variables.variant_n_hom_alt_spec

The number of samples with homozygous alternate calls.

variables.variant_n_hom_ref_spec

The number of samples with homozygous reference calls.

variables.variant_n_non_ref_spec

The number of samples that are not homozygous reference calls.

variables.variant_p_value_spec

P values as float in [0, 1].

variables.variant_position_spec

The reference position of the variant.

variables.variant_t_value_spec

T statistics for each beta.

variables.window_start_spec

The index values of window start positions along the variants dimension.

variables.window_stop_spec

The index values of window stop positions along the variants dimension.