-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse matrix support in hierarchical clustering #173
Comments
Actually, it's somewhat intriguing to have a sparse distance matrix. |
Regarding the updating of the distance matrix there is no difference between sparse or dense storage. I agree that sparse distance matrices are rare but this can happen with non-metric distances. In my case I have to deal with very large matrices where distances are not known (needed) between many pairs of data points. In these cases the distance is set to 0. The distance has a negative value for more similar data points, e.g. for duplicates the distance would be -1. |
Thanks for the clarifications. But you can try introducing the type parameter that defines which matrix format to use. Something like this: function hclust_minimum(::Type{M}, ds::AbstractMatrix{T}) where {M<:AbstractMatrix, T<:Real}
d = M(ds)
....
end And |
Thank you for the suggestions. I think it can be a quite flexible solution. |
I ran into an Out of memory error when applying single linkage hierarchical clustering on a large sparse distance matrix.
I figured out that the problem is in the first line of the hclust_minimum function
This creates a dense matrix from any kind of input distance matrix. If ds is a sparse matrix that still fits into the memory, this line can crash the procedure if the dense counterpart of df becomes to big for the working memory.
This issue could easily be solved using a SparseMatrixCSC matrix from the SparseArray module.
I think for the other types of hierarchical clustering it will be similar.
The text was updated successfully, but these errors were encountered: