Topic Reduction #1134

econinomista · 2023-03-29T12:28:31Z

econinomista
Mar 29, 2023

Dear all,

I have a question regarding the reducing of topics in BERT. I have trained a model using twitter data and the top 100 frequented topics look very good to me. Now, I would like to keep those 100 topics for predicting the rest of my data. I do not want to use model.reduce_topics, since the topics I get then are too broad. Is there a possibility to just keep my top 100 topics from before and perform the predictions based on that and is there someone who already has experience in doing so?

Best and thanks in advance
Nikola

MaartenGr · 2023-03-29T13:05:30Z

MaartenGr
Mar 29, 2023
Maintainer

If you want to only keep the 100 frequented topics and not do anything else with the other topics, including their respective documents, then it might be worthwhile to use manual topic modeling. You would take the documents and their labels of the top 100 most frequent topics and create a separate model that only learns to predict those 100 topics.

Other than that, you could also merge all other topics together and consider them to be an outlier class. That way, you would not have to create a separate model.

1 reply

econinomista Mar 29, 2023
Author

Thank you very much for your quick response!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topic Reduction #1134

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Topic Reduction #1134

econinomista Mar 29, 2023

Replies: 1 comment · 1 reply

MaartenGr Mar 29, 2023 Maintainer

econinomista Mar 29, 2023 Author

econinomista
Mar 29, 2023

Replies: 1 comment 1 reply

MaartenGr
Mar 29, 2023
Maintainer

econinomista Mar 29, 2023
Author