Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do Hierarchical Topic Modeling on Merged Model? #2152

Open
1 task done
shivamtawari opened this issue Sep 16, 2024 · 3 comments
Open
1 task done

How to do Hierarchical Topic Modeling on Merged Model? #2152

shivamtawari opened this issue Sep 16, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@shivamtawari
Copy link

shivamtawari commented Sep 16, 2024

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Desribe the bug

Hi @MaartenGr
I am trying to create visualization of hierarchical topic modeling on two topic models merged using .merge_models.

hierarchical_topics_merged = merged_model.hierarchical_topics(docs_1+docs_2)

It produces the following error:

2024-09-16 09:47:47,878 - BERTopic - WARNING: No c-TF-IDF matrix was found despite it is supposed to be used (`use_ctfidf` is True). Defaulting to semantic embeddings.
---------------------------------------------------------------------------
NotFittedError                            Traceback (most recent call last)
[<ipython-input-24-5238a0008058>](https://localhost:8080/#) in <cell line: 1>()
----> 1 hierarchical_topics_merged = merged_model.hierarchical_topics(docs_3)

2 frames
[/usr/local/lib/python3.10/dist-packages/sklearn/feature_extraction/text.py](https://localhost:8080/#) in _check_vocabulary(self)
    506             self._validate_vocabulary()
    507             if not self.fixed_vocabulary_:
--> 508                 raise NotFittedError("Vocabulary not fitted or provided")
    509 
    510         if len(self.vocabulary_) == 0:

NotFittedError: Vocabulary not fitted or provided

How do I visualize merged models?

Thanks!

BERTopic Version

v0.16.3

@shivamtawari shivamtawari added the bug Something isn't working label Sep 16, 2024
@MaartenGr
Copy link
Owner

I'm missing the full error log (those "2 frames" that you have there). Without it I can't say exactly what the problem is. Having said that, you can use use_ctfidf=False to solve your problem.

@shivamtawari
Copy link
Author

Hi, I forgot to mention the complete error log. Here it is:

NotFittedError                            Traceback (most recent call last)
<ipython-input-24-5238a0008058> in <cell line: 1>()
----> 1 hierarchical_topics_merged = merged_model.hierarchical_topics(docs_3)

2 frames
/usr/local/lib/python3.10/dist-packages/bertopic/_bertopic.py in hierarchical_topics(self, docs, use_ctfidf, linkage_function, distance_function)
   1101         # and will be removed in 1.2. Please use get_feature_names_out instead.
   1102         if version.parse(sklearn_version) >= version.parse("1.0.0"):
-> 1103             words = self.vectorizer_model.get_feature_names_out()
   1104         else:
   1105             words = self.vectorizer_model.get_feature_names()

/usr/local/lib/python3.10/dist-packages/sklearn/feature_extraction/text.py in get_feature_names_out(self, input_features)
   1483             Transformed feature names.
   1484         """
-> 1485         self._check_vocabulary()
   1486         return np.asarray(
   1487             [t for t, i in sorted(self.vocabulary_.items(), key=itemgetter(1))],

/usr/local/lib/python3.10/dist-packages/sklearn/feature_extraction/text.py in _check_vocabulary(self)
    506             self._validate_vocabulary()
    507             if not self.fixed_vocabulary_:
--> 508                 raise NotFittedError("Vocabulary not fitted or provided")
    509 
    510         if len(self.vocabulary_) == 0:

NotFittedError: Vocabulary not fitted or provided

@MaartenGr
Copy link
Owner

Ah, it seems that it truly needs a fitted vectorizer in order to run this model. Hmmm, the only thing that could solve is by running .update_topics with the documents of both models to recreate a vectorizer model before doing the hierarchical topic modeling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants