Open
Description
I ran into an Out of memory error when applying single linkage hierarchical clustering on a large sparse distance matrix.
I figured out that the problem is in the first line of the hclust_minimum function
function hclust_minimum(ds::AbstractMatrix{T}) where T<:Real
d = Matrix(ds)
This creates a dense matrix from any kind of input distance matrix. If ds is a sparse matrix that still fits into the memory, this line can crash the procedure if the dense counterpart of df becomes to big for the working memory.
This issue could easily be solved using a SparseMatrixCSC matrix from the SparseArray module.
function hclust_minimum(ds::AbstractMatrix{T}) where T<:Real
d = SparseMatrixCSC(ds)
I think for the other types of hierarchical clustering it will be similar.