Skip to content

Sparse matrix support in hierarchical clustering #173

Open
@hecking

Description

@hecking

I ran into an Out of memory error when applying single linkage hierarchical clustering on a large sparse distance matrix.

I figured out that the problem is in the first line of the hclust_minimum function

function hclust_minimum(ds::AbstractMatrix{T}) where T<:Real
     d = Matrix(ds)

This creates a dense matrix from any kind of input distance matrix. If ds is a sparse matrix that still fits into the memory, this line can crash the procedure if the dense counterpart of df becomes to big for the working memory.

This issue could easily be solved using a SparseMatrixCSC matrix from the SparseArray module.

function hclust_minimum(ds::AbstractMatrix{T}) where T<:Real
     d = SparseMatrixCSC(ds)

I think for the other types of hierarchical clustering it will be similar.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions