You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, it seems that PCA requires the potentially very large n_structures x n_features feature matrix as an argument. This will not fit in memory for very large datasets.
Perhaps it would be beneficial to design a custom PCA class that allows for the accumulation of a n_features x n_features covariance matrix, which is manageable and can be diagonalized once all structures have been processed. In this way, the exploration of potentially huge datasets should become possible even on ordinary laptops, potentially taking advantage of batched evaluation (and a few hours of runtime)
The text was updated successfully, but these errors were encountered:
We (i.e. @sofiia-chorna) explored batched PCA a bit, but it was not better than full PCA at the time. If someone else wants to give it another go feel free though!
Another option in that one can use immediately is a custom featurize function, that can use alternative algorithms for dimensionality reduction without any change to chemiscope.
At the moment, it seems that PCA requires the potentially very large
n_structures x n_features
feature matrix as an argument. This will not fit in memory for very large datasets.Perhaps it would be beneficial to design a custom PCA class that allows for the accumulation of a
n_features x n_features
covariance matrix, which is manageable and can be diagonalized once all structures have been processed. In this way, the exploration of potentially huge datasets should become possible even on ordinary laptops, potentially taking advantage of batched evaluation (and a few hours of runtime)The text was updated successfully, but these errors were encountered: