Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Density heatmap for large datasets #314

Open
JPDarby opened this issue Oct 30, 2023 · 5 comments
Open

Density heatmap for large datasets #314

JPDarby opened this issue Oct 30, 2023 · 5 comments
Labels
component-map Issues related to the map component

Comments

@JPDarby
Copy link

JPDarby commented Oct 30, 2023

For very large datasets it would be nice to have the option of replacing the scatter plot with a density heatmap. I'm imagining loading a random structure from each "bin" and maybe dynamically updating the binning with the zoom level.

Happy to have a go at this myself but keen for any suggestions!

@Luthaf already suggested using a custom loadStructure callback for visualising the structures on demand.

@Luthaf Luthaf added the component-map Issues related to the map component label Oct 30, 2023
@ceriottm
Copy link
Contributor

This would be an excellent feature, but it is unclear (1) how much support is there for this kind of idea in plotly and (2) how much this would weigh on the memory footprint of the dataset and widget. Chemiscope is built assuming that everything can be made portable, and even the dynamical loading of structures is something we never exploited much.

Perhaps one possibility would be to still have only a few hardcoded representative structures in the dataset, but add volumetric data that can be visualized in plotly, to give a better sense of the distribution of data. In this sense, one could imagine of providing "shape" data for the property panel similar to what we recently added for structures. This way, one could visualize a convex hull, or do something a volumetric plot of the density of points.

Perhaps it'd help to advance the discussion if you explained what is the problem you are facing and want to solve.

@JPDarby
Copy link
Author

JPDarby commented Oct 31, 2023

Thanks for the fast reply and consideration.

I'm hoping to use chemiscope to visualise structures stored in the NOMAD database. The dream is to have an interactive plot that updates in "near real time" as a user specifies/adjusts their query. This is an example query and there are already some interactive widgets. The whole database contains ~10 million structures so I'm suspicious (but admit I haven't checked this...) that visualising large queries will be very slow atm and that some sort of heatmap + dynamical loading of structure data is the way to go.

We're planning to precompute averaged SOAP vectors (element agnostic) and MACE descriptors (learned alchemical embedding) for every structure. Then use some combination of PCA and parametric UMAP for the dimensional reduction depending on the size of the query.

@JPDarby
Copy link
Author

JPDarby commented Oct 31, 2023

In terms of your points
(1) I don't think plotly supports the dynamic rebinning. I'm happy to have a go at this. Could also switch from heatmap to scatter plot if a certain zoom threshold is crossed.
(2) having a representative set of structures available to be viewed for each bin would be completely fine and maybe this would be a good place to start

@Luthaf
Copy link
Contributor

Luthaf commented Oct 31, 2023

Since this is intended for a specific deployment of chemiscope at NOMAD, another possible solution would be to replace the map widget entirely, and only re-use the other parts of the code. You could write a new widget with whatever technology works best to display the heatmap and link it to the chemiscope structure viewer, loading structures on-demand.

@ceriottm
Copy link
Contributor

I think one possibility that would combine a lot of advantages and be relatively easy would be to have the query generate a chemiscope .json that is then loaded dynamically, "sparsified" to show some representative structures. If you then zoom in, one could then have a button to update the view, re-generating a .json for that section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component-map Issues related to the map component
Projects
None yet
Development

No branches or pull requests

3 participants