sgkit.count_cohort_alleles#

sgkit.count_cohort_alleles(ds, *, call_allele_count='call_allele_count', sample_cohort='sample_cohort', merge=True)#

Compute per cohort allele counts from per-sample allele counts, or genotype calls.

Parameters:
ds Dataset

Dataset containing genotype calls.

call_allele_count Hashable (default: 'call_allele_count')

Input variable name holding call_allele_count as defined by sgkit.variables.call_allele_count_spec. If the variable is not present in ds, it will be computed using count_call_alleles().

sample_cohort Hashable (default: 'sample_cohort')

Input variable name holding sample_cohort as defined by sgkit.variables.sample_cohort_spec.

merge bool (default: True)

If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.

Return type:

Dataset

Returns:

: A dataset containing sgkit.variables.cohort_allele_count_spec of allele counts with shape (variants, cohorts, alleles) and values corresponding to the number of non-missing occurrences of each allele.

Examples

>>> import numpy as np
>>> import sgkit as sg
>>> import xarray as xr
>>> ds = sg.simulate_genotype_call_dataset(n_variant=5, n_sample=4)
>>> # Divide samples into two cohorts
>>> ds["sample_cohort"] = xr.DataArray(np.repeat([0, 1], ds.sizes["samples"] // 2), dims="samples")
>>> sg.display_genotypes(ds) 
samples    S0   S1   S2   S3
variants
0         0/0  1/0  1/0  0/1
1         1/0  0/1  0/0  1/0
2         1/1  0/0  1/0  0/1
3         1/0  1/1  1/1  1/0
4         1/0  0/0  1/0  1/1
>>> sg.count_cohort_alleles(ds)["cohort_allele_count"].values 
array([[[3, 1],
        [2, 2]],

        [[2, 2],
        [3, 1]],

        [[2, 2],
        [2, 2]],

        [[1, 3],
        [1, 3]],

        [[3, 1],
        [1, 3]]], dtype=uint64)