You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using non-negative matrix factorization (NMF) for topic modelling (as an alternative to LDA) for a while now, but so far I have not been able to find a good R package for this. In my limited experience, the NMF package is a bit of a mess that does not work properly due to being heavily spiked with Bioconductor dependencies and when I did manage to make it work, it seemed slow. The other two packages that can do NMF are NMFN and rNMF. I have found both to be rather slow.
My solution so far has been to use reticulate:
library(reticulate)
use_condaenv("r-reticulate")
sklearn <- import("sklearn")
decomp <- py_run_string("from sklearn import decomposition")
model <- decomp$decomposition$NMF(init="nndsvd", n_components= as.integer(15),
random_state = as.integer(23))
W = model$fit_transform(your_matrix)
H = model$components_
This works well, but native R support would be obviously better. I don't know how difficult it would be to port the Python solution to R or optimize the existing packages, but I thought I would raise this here in case you thought this was a worthy addition to the quanteda.textmodels family. I read your discussion about supporting LDA, but I think the way NMF works is somewhat more conducive to being directly supported here (plus the fact that unlike LDA, there aren't good alternatives out there).
Greene and Cross 2017 take this a step further (and generally make the case for NMF topic modelling), but for starters a fast NMF decomposer that actually works (with text data) would be nice.
The text was updated successfully, but these errors were encountered:
I have been using non-negative matrix factorization (NMF) for topic modelling (as an alternative to LDA) for a while now, but so far I have not been able to find a good R package for this. In my limited experience, the NMF package is a bit of a mess that does not work properly due to being heavily spiked with Bioconductor dependencies and when I did manage to make it work, it seemed slow. The other two packages that can do NMF are NMFN and rNMF. I have found both to be rather slow.
My solution so far has been to use reticulate:
This works well, but native R support would be obviously better. I don't know how difficult it would be to port the Python solution to R or optimize the existing packages, but I thought I would raise this here in case you thought this was a worthy addition to the quanteda.textmodels family. I read your discussion about supporting LDA, but I think the way NMF works is somewhat more conducive to being directly supported here (plus the fact that unlike LDA, there aren't good alternatives out there).
Greene and Cross 2017 take this a step further (and generally make the case for NMF topic modelling), but for starters a fast NMF decomposer that actually works (with text data) would be nice.
The text was updated successfully, but these errors were encountered: