sgkit.ld_matrix#
- sgkit.ld_matrix(ds, *, dosage='call_dosage', threshold=None, variant_score=None)#
Compute a sparse linkage disequilibrium (LD) matrix.
This method computes the Rogers Huff R2 value for each pair of variants in a window, and returns those that exceed the provided threshold, as a sparse matrix dataframe.
- Parameters:
- ds
Dataset
Dataset containing genotype dosages. Must already be windowed with
window_by_position()
orwindow_by_variant()
.- dosage
Hashable
(default:'call_dosage'
) Name of genetic dosage variable. Defined by
sgkit.variables.call_dosage_spec
.- threshold
float
|None
Optional
[float
] (default:None
) R2 threshold below which no variant pairs will be returned. This should almost always be something at least slightly above 0 to avoid the large density very near zero LD present in most datasets.
- variant_score
Hashable
|None
Optional
[Hashable
] (default:None
) Optional name of variable to use to prioritize variant selection (e.g. minor allele frequency). Defaults to None. Defined by
sgkit.variables.variant_score_spec
.
- ds
- Return type:
- Returns:
: Upper triangle (including diagonal) of LD matrix as COO in dataframe. Fields:
i
: Row (variant) index 1j
: Row (variant) index 2value
: R2 valuecmp
: Ifvariant_score
is provided, this is 1, 0, or -1 indicating whether or noti > j
(1),i < j
(-1), ori == j
(0)
- Raises:
ValueError – If the dataset is not windowed.