Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-negative matrix factorization #44

Open
michalovadek opened this issue Dec 16, 2020 · 1 comment
Open

Non-negative matrix factorization #44

michalovadek opened this issue Dec 16, 2020 · 1 comment

Comments

@michalovadek
Copy link

I have been using non-negative matrix factorization (NMF) for topic modelling (as an alternative to LDA) for a while now, but so far I have not been able to find a good R package for this. In my limited experience, the NMF package is a bit of a mess that does not work properly due to being heavily spiked with Bioconductor dependencies and when I did manage to make it work, it seemed slow. The other two packages that can do NMF are NMFN and rNMF. I have found both to be rather slow.

My solution so far has been to use reticulate:

library(reticulate)

use_condaenv("r-reticulate")

sklearn <- import("sklearn")

decomp <- py_run_string("from sklearn import decomposition")

model <- decomp$decomposition$NMF(init="nndsvd", n_components= as.integer(15),
                                   random_state = as.integer(23))

W = model$fit_transform(your_matrix)
H = model$components_

This works well, but native R support would be obviously better. I don't know how difficult it would be to port the Python solution to R or optimize the existing packages, but I thought I would raise this here in case you thought this was a worthy addition to the quanteda.textmodels family. I read your discussion about supporting LDA, but I think the way NMF works is somewhat more conducive to being directly supported here (plus the fact that unlike LDA, there aren't good alternatives out there).

Greene and Cross 2017 take this a step further (and generally make the case for NMF topic modelling), but for starters a fast NMF decomposer that actually works (with text data) would be nice.

@kbenoit
Copy link
Contributor

kbenoit commented Dec 16, 2020

Agreed, this would be a very good addition. I saw a talk last month about guided NMF that was all about text as well.
https://arxiv.org/pdf/2010.11365.pdf
https://github.com/jvendrow/GuidedNMF

I am pretty sure we could get the R packages working more cleanly and for sparse inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants