-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry on Utilizing UMAP for Text Similarity and Clustering #1113
Comments
Thank you for the kind words. Mostly what UMAP will buy you over using DBSCAN directly on the embedding vectors is a lot more of your data clustered while still having reasonably fine-grained clusters. Can I guarantee better results? I think there are no guarantees, especially in unsupervised learning. Would I expect better results if you use UMAP first and then DBSCAN or HDBSCAN? Yes, I definitely would. Choosing parameters is always going to come down to the data you have, the kinds of results you want to get, and what you are going to use the clustering for from there. Some rules of thumb: |
Thank you for your kind response. I'll start as you suggested! Can UMAP be updated in batches? Is it possible to create a UMAP model for large images and further train it? It seems impossible due to UMAP's mechanics, but I wonder if implementing this feature would be difficult. |
I think for that use case you might want to look into ParametricUMAP. UMAP does have an |
Thank you for your response! I will check this Parametric UMAP! |
Hello,
I would like to express my sincere appreciation for your passionate communication and efficient package management. I have reviewed the documentation and code related to the use of UMAP, but find myself in need of expert advice.
My intention is to use UMAP for clustering and measuring the similarity between arrays of sentence embeddings. There are no labels associated with this data, and I have several questions about this process. Additionally, I wish to logically discuss how the text similarity results compare to the outcomes provided by UMAP.
Any keywords, references, or preliminary answers you could provide would be greatly appreciated.
Thank you once again for your wonderful project.
The text was updated successfully, but these errors were encountered: