Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting a a condense tree result using the hdbscan.plots.CondensedTree class #25

Open
u3ks opened this issue Oct 11, 2024 · 3 comments

Comments

@u3ks
Copy link

u3ks commented Oct 11, 2024

Hi all,

Im trying to plot the output of fast_hdbscan.cluster_trees.condense_tree using the hdbscan.plots.CondensedTree class .
I tried converting the result like so:

ct_raw = np.rec.fromarrays((ct[0], ct[1], ct[2], ct[3]), dtype=[(' parent', np.intp),('child', np.intp),('lambda_val', float),('child_size', np.intp)])

Then passing it to the constructor - CondensedTree(ct_raw) - but i get an error that there are some parent nodes without children in the ct_raw array.

Specifically, the .max() call below (from the hdbscan.plots.CondensedTree.get_plot_data) throws the exception that its being called on an empty array:

`
for c in range(last_leaf, root - 1, -1):

        cluster_bounds[c] = [0, 0, 0, 0]

        c_children = self._raw_tree[self._raw_tree['parent'] == c]
        current_size = np.sum(c_children['child_size'])
        current_lambda = cluster_y_coords[c]
        cluster_max_size = current_size
        cluster_max_lambda = c_children['lambda_val'].max()`

Do you have any pointers how to convert between the two representations or how to change the get_plot_data function?

@lmcinnes
Copy link
Contributor

You may have ended up with a condensed forest instead of a condensed tree. That shouldn't really be possible, but perhaps there is a bug that makes it possible? I would need to see the actual tree data to diagnose...

@u3ks
Copy link
Author

u3ks commented Oct 15, 2024

Actually, I think I found the issue - it was because I was testing out the new sample weights functionality and I had a sample weight instance that was larger than the specified min_cluster_size.

Maybe throwing a warning for this in the initial tree construction would be beneficial?

@lmcinnes
Copy link
Contributor

Yes, that might be something that would be sensible. The sample weight stuff is pretty new so it isn't well tested yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants