Changelog#
0.6.0 (unreleased)#
New Features#
Add pedigree support. This allows parent-child relationships to be stored in sgkit, and provides a number of new pedigree methods:
pedigree_inbreeding()
,pedigree_inverse_kinship()
, andpedigree_kinship()
. (timothymillar, GH 786)Implement a function to calculate the VanRaden genomic relationship matrix,
genomic_relationship()
. (timothymillar, PR 903, GH 874)Generic functions for cohort sums and means. (timothymillar, PR 867, GH 730)
Toggle numba caching by environment variable
SGKIT_DISABLE_NUMBA_CACHE
. (timothymillar, PR 870, GH 869)Add
window_by_genome()
for computing whole-genome statistics. (tomwhite, PR 945, GH 893)Add
window_by_interval()
to create windows from arbitrary intervals. (tomwhite, PR 974)Add
contig_lengths
dataset attribute if found in the VCF file. (tomwhite, PR 946, GH 464)Add
auto_rechunk
option tosgkit.save_dataset
to automatically rechunk the dataset before saving it to disk, if necessary, as zarr requires equal chunk sizes. (benjeffery, PR 988, GH 981)Implement gene-ε for gene set association analysis. (tomwhite, PR 975, GH 692)
Add
count_variant_genotypes()
to count the occurrence of each possible genotype. (timothymillar, GH 911, PR 1002)
Breaking changes#
The
count_a1
parameter tosgkit.io.plink.read_plink()
previously defaulted toTrue
but now defaults toFalse
. Furthermore,True
is no longer supported since it is not clear how it should behave. (tomwhite, PR 952, GH 947)The
dosage
variable specification has been removed and all references to it have been replaced withsgkit.variables.call_dosage_spec
which has been generalized to include integer encodings. Additionally, the default value for thedosage
parameter inld_matrix()
andld_prune()
has been changed from'dosage'
to'call_dosage'
. (timothymillar, PR 995, GH 875)The
genotype_count
variable has been removed in favour ofsgkit.variables.variant_genotype_count_spec
which follows VCF ordering (i.e., homozygous reference, heterozygous, homozygous alternate for biallelic, diploid genotypes).hardy_weinberg_test()
now defaults to usingsgkit.variables.variant_genotype_count_spec
for thegenotype_count
parameter. (timothymillar, GH 911, PR 1002)
Improvements#
Bug fixes#
Allow chunking in the samples dimension for
identity_by_state()
. (timothymillar, PR 837, GH 836)Remove VLenUTF8 from filters to avoid double encoding error. (tomwhite, PR 852, GH 785)
Fix numpy input for
Weir_Goudet_beta
. (timothymillar, PR 865, GH 861)Fix
get_region_start
to work with contig names that have colons and dashes. (d-laub, PR 883, GH 882)Fixes to VCF reading and writing found by hypothesis testing. (tomwhite, PR 972)