sgkit.observed_heterozygosity#
- sgkit.observed_heterozygosity(ds, *, call_heterozygosity='call_heterozygosity', sample_cohort='sample_cohort', merge=True)#
Compute per cohort observed heterozygosity.
The observed heterozygosity of a cohort is the mean of individual heterozygosity values among all samples of that cohort as described in
individual_heterozygosity()
. Calls with a nan value for individual heterozygosity are ignored when calculating the cohort mean.By default, values of this statistic are calculated per variant. To compute values in windows, call
window_by_position()
orwindow_by_variant()
before calling this function.- Parameters:
- ds
Dataset
Dataset containing genotype calls.
- call_heterozygosity
Hashable
(default:'call_heterozygosity'
) Input variable name holding call_heterozygosity as defined by
sgkit.variables.call_heterozygosity_spec
. If the variable is not present inds
, it will be computed usingindividual_heterozygosity()
.- sample_cohort
Hashable
(default:'sample_cohort'
) Input variable name holding sample_cohort as defined by
sgkit.variables.sample_cohort_spec
.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.stat_observed_heterozygosity_spec
of per cohort observed heterozygosity with shape (variants, cohorts) containing values within the inteval [0, 1] or nan.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts >>> sample_cohort = np.repeat([0, 1], ds.sizes["samples"] // 2) >>> ds["sample_cohort"] = xr.DataArray(sample_cohort, dims="samples")
>>> sg.observed_heterozygosity(ds)["stat_observed_heterozygosity"].values array([[0.5, 1. ], [1. , 0.5], [0. , 1. ], [0.5, 0.5], [0.5, 0.5]])
>>> # Divide into windows of size three (variants) >>> ds = sg.window_by_variant(ds, size=3) >>> sg.observed_heterozygosity(ds)["stat_observed_heterozygosity"].values array([[1.5, 2.5], [1. , 1. ]])