sgkit.count_cohort_alleles#
- sgkit.count_cohort_alleles(ds, *, call_allele_count='call_allele_count', sample_cohort='sample_cohort', merge=True)#
Compute per cohort allele counts from per-sample allele counts, or genotype calls.
- Parameters:
- ds
Dataset
Dataset containing genotype calls.
- call_allele_count
Hashable
(default:'call_allele_count'
) Input variable name holding call_allele_count as defined by
sgkit.variables.call_allele_count_spec
. If the variable is not present inds
, it will be computed usingcount_call_alleles()
.- sample_cohort
Hashable
(default:'sample_cohort'
) Input variable name holding sample_cohort as defined by
sgkit.variables.sample_cohort_spec
.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.cohort_allele_count_spec
of allele counts with shape (variants, cohorts, alleles) and values corresponding to the number of non-missing occurrences of each allele.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts >>> ds["sample_cohort"] = xr.DataArray(np.repeat([0, 1], ds.sizes["samples"] // 2), dims="samples") >>> sg.display_genotypes(ds) samples S0 S1 S2 S3 variants 0 0/0 1/0 1/0 0/1 1 1/0 0/1 0/0 1/0 2 1/1 0/0 1/0 0/1 3 1/0 1/1 1/1 1/0 4 1/0 0/0 1/0 1/1
>>> sg.count_cohort_alleles(ds)["cohort_allele_count"].values array([[[3, 1], [2, 2]], [[2, 2], [3, 1]], [[2, 2], [2, 2]], [[1, 3], [1, 3]], [[3, 1], [1, 3]]], dtype=uint64)