sgkit.maximal_independent_set#
- sgkit.maximal_independent_set(df)#
Compute a maximal independent set of variants.
This method is based on the PLINK algorithm that selects independent vertices from a graph implied by excessive LD between variants.
For an outline of this process, see this discussion.
- Parameters:
- df
DataFrame
Dataframe containing a sparse matrix of R2 values, typically from
sgkit.ld_matrix()
.
- df
Warning
This algorithm will materialize the whole input dataframe (the LD matrix) in memory.
- Raises:
ValueError – If
i
andj
are not sorted ascending (and in that order)- Return type:
- Returns:
: A dataset containing the indexes of variants to drop, as defined by
sgkit.variables.ld_prune_index_to_drop_spec
.