sgkit.filter_partial_calls#
- sgkit.filter_partial_calls(ds, *, call_genotype='call_genotype', merge=True)#
Replace partial genotype calls with missing values.
- Parameters:
- ds
Dataset
Dataset containing genotype calls.
- call_genotype
Hashable
(default:'call_genotype'
) Input variable name holding call_genotype as defined by
sgkit.variables.call_genotype_spec
- merge
bool
(default:True
) If True (the default), merge the input dataset and the computed output variables into a single dataset, otherwise return only the computed output variables. See Dataset merge behavior for more details.
- ds
- Return type:
- Returns:
: Dataset containing
sgkit.variables.call_genotype_complete_spec
andsgkit.variables.call_genotype_complete_mask_spec
in which partial genotype calls are replaced with completely missing genotype calls.
Examples
>>> import sgkit as sg >>> from sgkit.testing import simulate_genotype_call_dataset >>> ds = simulate_genotype_call_dataset(n_variant=4, n_sample=2, seed=1, missing_pct=0.3) >>> sg.display_genotypes(ds) samples S0 S1 variants 0 ./0 ./. 1 ./0 1/1 2 0/1 ./0 3 ./0 0/0 >>> ds2 = filter_partial_calls(ds) >>> ds2['call_genotype'] = ds2['call_genotype_complete'] >>> ds2['call_genotype_mask'] = ds2['call_genotype_complete_mask'] >>> sg.display_genotypes(ds2) samples S0 S1 variants 0 ./. ./. 1 ./. 1/1 2 0/1 ./. 3 ./. 0/0
Notes
The returned dataset will still contain the initial
call_genotype
andcall_genotype_mask
variables. Many sgkit functions will default to usingcall_genotype
and/orcall_genotype_mask
, hence it is necessary to overwrite these variables (see the example) or explicitly pass the new variables as function arguments in order to remove partial calls from futher analysis.