sgkit.ld_matrix
sgkit.ld_matrix#
- sgkit.ld_matrix(ds, *, dosage='dosage', threshold=None, variant_score=None)#
Compute a sparse linkage disequilibrium (LD) matrix.
This method computes the Rogers Huff R2 value for each pair of variants in a window, and returns those that exceed the provided threshold, as a sparse matrix dataframe.
- Parameters
- ds :
Dataset Dataset containing genotype dosages. Must already be windowed with
window_by_position()orwindow_by_variant().- dosage :
Hashable(default:'dosage') Name of genetic dosage variable. Defined by
sgkit.variables.dosage_spec.- threshold :
float|NoneOptional[float] (default:None) R2 threshold below which no variant pairs will be returned. This should almost always be something at least slightly above 0 to avoid the large density very near zero LD present in most datasets.
- variant_score :
Hashable|NoneOptional[Hashable] (default:None) Optional name of variable to use to prioritize variant selection (e.g. minor allele frequency). Defaults to None. Defined by
sgkit.variables.variant_score_spec.
- ds :
- Return type
- Returns
Upper triangle (including diagonal) of LD matrix as COO in dataframe. Fields:
i: Row (variant) index 1j: Row (variant) index 2value: R2 valuecmp: Ifvariant_scoreis provided, this is 1, 0, or -1 indicating whether or noti > j(1),i < j(-1), ori == j(0)
- Raises
ValueError – If the dataset is not windowed.