sgkit.diversity#
- sgkit.diversity(ds, *, cohort_allele_count='cohort_allele_count', merge=True)#
Compute diversity from cohort allele counts.
By default, values of this statistic are calculated per variant. To compute values in windows, call
window_by_position()
orwindow_by_variant()
before calling this function.- Parameters:
- ds
Dataset
Genotype call dataset.
- cohort_allele_count
Hashable
(default:'cohort_allele_count'
) Cohort allele count variable to use or calculate. Defined by
sgkit.variables.cohort_allele_count_spec
. If the variable is not present inds
, it will be computed usingcount_cohort_alleles()
.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing the diversity values, as defined by
sgkit.variables.stat_diversity_spec
. Shape (variants, cohorts), or (windows, cohorts) if windowing information is available.
Warning
This method does not currently support datasets that are chunked along the samples dimension.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts >>> sample_cohort = np.repeat([0, 1], ds.sizes["samples"] // 2) >>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> sg.diversity(ds)["stat_diversity"].values array([[0.5 , 0.66666667], [0.66666667, 0.5 ], [0.66666667, 0.66666667], [0.5 , 0.5 ], [0.5 , 0.5 ]])
>>> # Divide into windows of size three (variants) >>> ds = sg.window_by_variant(ds, size=3) >>> sg.diversity(ds)["stat_diversity"].values array([[1.83333333, 1.83333333], [1. , 1. ]])