Time series of topics for large texts and a response variable #1258
-
Thank you for making BERTopic available and really helpful website and code examples. I have a time-series of fairly large texts, hundreds/thousands of words, I have four columns in my dataset: I've been looking for a way to get topics for each Since my texts are large, I split them into sentences. Half of them got assigned to topic Is it okay to group and average the probabilities of the remaining topics? Will it be representative of the whole text? Is it okay to take time differences between probabilities for each topic? It is very possible that I'm missing something and there is a simpler approach to what I'm trying to do. Please suggest if so. Thanks again! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
That generally should be okay if the sub-documents are of a relatively equal size.
If I understand you correctly, then I think so yes since the dynamic topic modeling in BERTopic is doing something similar.
If you are looking for statistically comparing response variable then it might be worthwhile to check out this thread that demonstrates the use of covariate analysis in BERTopic. |
Beta Was this translation helpful? Give feedback.
That generally should be okay if the sub-documents are of a relatively equal size.
If I understand you correctly, then I think so yes since the dynamic topic modeling in BERTopic is doing something similar.
If you are looking for statistically comparing response variable then it might be worthwhile to check out this thread that demonstrates the use of co…