Skip to content

Time series of topics for large texts and a response variable #1258

Answered by MaartenGr
dragonattheend asked this question in Q&A
Discussion options

You must be logged in to vote

Is it okay to group and average the probabilities of the remaining topics? Will it be representative of the whole text?

That generally should be okay if the sub-documents are of a relatively equal size.

Is it okay to take time differences between probabilities for each topic?

If I understand you correctly, then I think so yes since the dynamic topic modeling in BERTopic is doing something similar.

It is very possible that I'm missing something and there is a simpler approach to what I'm trying to do. Please suggest if so. Thanks again!

If you are looking for statistically comparing response variable then it might be worthwhile to check out this thread that demonstrates the use of co…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@dragonattheend
Comment options

@MaartenGr
Comment options

Answer selected by dragonattheend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants