.. currentmodule:: sgkit Changelog ========= .. _changelog.0.6.0: 0.6.0 (unreleased) ------------------ New Features ~~~~~~~~~~~~ - Add support for Python 3.10. (:user:`tomwhite`, :pr:`813`, :issue:`801`) - Add pedigree support. This allows parent-child relationships to be stored in sgkit, and provides a number of new pedigree methods: :func:`pedigree_inbreeding`, :func:`pedigree_inverse_kinship`, and :func:`pedigree_kinship`. (:user:`timothymillar`, :issue:`786`) - Implement a function to calculate the VanRaden genomic relationship matrix, :func:`genomic_relationship`. (:user:`timothymillar`, :pr:`903`, :issue:`874`) - Generic functions for cohort sums and means. (:user:`timothymillar`, :pr:`867`, :issue:`730`) - Toggle numba caching by environment variable ``SGKIT_DISABLE_NUMBA_CACHE``. (:user:`timothymillar`, :pr:`870`, :issue:`869`) - Add :func:`window_by_genome` for computing whole-genome statistics. (:user:`tomwhite`, :pr:`945`, :issue:`893`) - Add :func:`window_by_interval` to create windows from arbitrary intervals. (:user:`tomwhite`, :pr:`974`) - Add ``contig_lengths`` dataset attribute if found in the VCF file. (:user:`tomwhite`, :pr:`946`, :issue:`464`) - Add VCF export functions. (:user:`tomwhite`, :pr:`953`, :issue:`924`) - Add ``auto_rechunk`` option to ``sgkit.save_dataset`` to automatically rechunk the dataset before saving it to disk, if necessary, as zarr requires equal chunk sizes. (:user:`benjeffery`, :pr:`988`, :issue:`981`) - Implement gene-ε for gene set association analysis. (:user:`tomwhite`, :pr:`975`, :issue:`692`) - Add :func:`count_variant_genotypes` to count the occurrence of each possible genotype. (:user:`timothymillar`, :issue:`911`, :pr:`1002`) Breaking changes ~~~~~~~~~~~~~~~~ - Remove support for Python 3.7. (:user:`tomwhite`, :pr:`927`, :issue:`802`) - The ``count_a1`` parameter to :func:`sgkit.io.plink.read_plink` previously defaulted to ``True`` but now defaults to ``False``. Furthermore, ``True`` is no longer supported since it is not clear how it should behave. (:user:`tomwhite`, :pr:`952`, :issue:`947`) - The ``dosage`` variable specification has been removed and all references to it have been replaced with :data:`sgkit.variables.call_dosage_spec` which has been generalized to include integer encodings. Additionally, the default value for the ``dosage`` parameter in :func:`ld_matrix` and :func:`ld_prune` has been changed from ``'dosage'`` to ``'call_dosage'``. (:user:`timothymillar`, :pr:`995`, :issue:`875`) - The ``genotype_count`` variable has been removed in favour of :data:`sgkit.variables.variant_genotype_count_spec` which follows VCF ordering (i.e., homozygous reference, heterozygous, homozygous alternate for biallelic, diploid genotypes). :func:`hardy_weinberg_test` now defaults to using :data:`sgkit.variables.variant_genotype_count_spec` for the ``genotype_count`` parameter. (:user:`timothymillar`, :issue:`911`, :pr:`1002`) .. Deprecations .. ~~~~~~~~~~~~ Improvements ~~~~~~~~~~~~ - Improvements to VCF parsing performance. (:user:`benjeffery`, :pr:`933`) - Improve default VCF compression. (:user:`tomwhite`, :pr:`937`, :issue:`925`) - Ensure chunking is not excessive in samples dimension. (:user:`tomwhite`, :pr:`943`) - Add asv benchmarks for VCF performance. (:user:`tomwhite`, :pr:`976`) - Add asv benchmarks for VCF compression size. (:user:`tomwhite`, :pr:`978`) Bug fixes ~~~~~~~~~ - Allow chunking in the samples dimension for :func:`identity_by_state`. (:user:`timothymillar`, :pr:`837`, :issue:`836`) - Remove VLenUTF8 from filters to avoid double encoding error. (:user:`tomwhite`, :pr:`852`, :issue:`785`) - Fix numpy input for ``Weir_Goudet_beta``. (:user:`timothymillar`, :pr:`865`, :issue:`861`) - Fix ``get_region_start`` to work with contig names that have colons and dashes. (:user:`d-laub`, :pr:`883`, :issue:`882`) - Fixes to VCF reading and writing found by hypothesis testing. (:user:`tomwhite`, :pr:`972`) .. Documentation .. ~~~~~~~~~~~~~