Changelog#
0.6.0 (unreleased)#
New Features#
Add pedigree support. This allows parent-child relationships to be stored in sgkit, and provides a number of new pedigree methods:
pedigree_inbreeding(),pedigree_inverse_kinship(), andpedigree_kinship(). (timothymillar, GH 786)Implement a function to calculate the VanRaden genomic relationship matrix,
genomic_relationship(). (timothymillar, PR 903, GH 874)Generic functions for cohort sums and means. (timothymillar, PR 867, GH 730)
Toggle numba caching by environment variable
SGKIT_DISABLE_NUMBA_CACHE. (timothymillar, PR 870, GH 869)Add
window_by_genome()for computing whole-genome statistics. (tomwhite, PR 945, GH 893)Add
window_by_interval()to create windows from arbitrary intervals. (tomwhite, PR 974)Add
contig_lengthsdataset attribute if found in the VCF file. (tomwhite, PR 946, GH 464)Add
auto_rechunkoption tosgkit.save_datasetto automatically rechunk the dataset before saving it to disk, if necessary, as zarr requires equal chunk sizes. (benjeffery, PR 988, GH 981)Implement gene-ε for gene set association analysis. (tomwhite, PR 975, GH 692)
Add
count_variant_genotypes()to count the occurrence of each possible genotype. (timothymillar, GH 911, PR 1002)
Breaking changes#
The
count_a1parameter tosgkit.io.plink.read_plink()previously defaulted toTruebut now defaults toFalse. Furthermore,Trueis no longer supported since it is not clear how it should behave. (tomwhite, PR 952, GH 947)The
dosagevariable specification has been removed and all references to it have been replaced withsgkit.variables.call_dosage_specwhich has been generalized to include integer encodings. Additionally, the default value for thedosageparameter inld_matrix()andld_prune()has been changed from'dosage'to'call_dosage'. (timothymillar, PR 995, GH 875)The
genotype_countvariable has been removed in favour ofsgkit.variables.variant_genotype_count_specwhich follows VCF ordering (i.e., homozygous reference, heterozygous, homozygous alternate for biallelic, diploid genotypes).hardy_weinberg_test()now defaults to usingsgkit.variables.variant_genotype_count_specfor thegenotype_countparameter. (timothymillar, GH 911, PR 1002)
Improvements#
Bug fixes#
Allow chunking in the samples dimension for
identity_by_state(). (timothymillar, PR 837, GH 836)Remove VLenUTF8 from filters to avoid double encoding error. (tomwhite, PR 852, GH 785)
Fix numpy input for
Weir_Goudet_beta. (timothymillar, PR 865, GH 861)Fix
get_region_startto work with contig names that have colons and dashes. (d-laub, PR 883, GH 882)Fixes to VCF reading and writing found by hypothesis testing. (tomwhite, PR 972)