sgkit.cohort_allele_frequencies#
- sgkit.cohort_allele_frequencies(ds, *, cohort_allele_count='cohort_allele_count', merge=True)#
Compute allele frequencies for each cohort.
- Parameters:
- ds
Dataset
Dataset containing genotype calls.
- cohort_allele_count
Hashable
(default:'cohort_allele_count'
) Input variable name holding cohort_allele_count as defined by
sgkit.variables.cohort_allele_count_spec
. If the variable is not present inds
, it will be computed usingcount_cohort_alleles()
.- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: A dataset containing
sgkit.variables.cohort_allele_frequency_spec
of allele frequencies with shape (variants, cohorts, alleles) and values corresponding to the frequency of non-missing occurrences of each allele.
Examples
>>> import numpy as np >>> import sgkit as sg >>> import xarray as xr >>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts >>> ds["sample_cohort"] = xr.DataArray(np.repeat([0, 1], ds.sizes["samples"] // 2), dims="samples") >>> sg.display_genotypes(ds) samples S0 S1 S2 S3 variants 0 0/0 1/0 1/0 0/1 1 1/0 0/1 0/0 1/0 2 1/1 0/0 1/0 0/1 3 1/0 1/1 1/1 1/0 4 1/0 0/0 1/0 1/1
>>> sg.cohort_allele_frequencies(ds)["cohort_allele_frequency"].values array([[[0.75, 0.25], [0.5 , 0.5 ]], [[0.5 , 0.5 ], [0.75, 0.25]], [[0.5 , 0.5 ], [0.5 , 0.5 ]], [[0.25, 0.75], [0.25, 0.75]], [[0.75, 0.25], [0.25, 0.75]]])