How can I tell BERTopic to only look at nouns? #1089
Unanswered
JubinaMarie
asked this question in
Q&A
Replies: 1 comment
-
Yes, you can look at only the nouns using the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
First, thank you so much for BERTopic. I am a complete newbie (at Python, Pandas, all of it) and I am amazed at what I've been able to do with it so far -- this is making topic modeling accessible. I have been trying lots of different things to get the results the best they can be for my use case and learning a lot along the way.
I've noticed that the documents themselves have a lot of useless text that BERTopic clusters on when I don't want it to. For example, let's say I want it to find all the documents about gas station shoplifting and treat them as one topic. Unfortunately, those documents include the names of hundreds of different gas stations and often include the address and other misc details about the event, including random SKU numbers of the items taken. Meanwhile, all I care about is that it's a gas station and there was shoplifting. I tried seeding with words like "gas, station, shoplift" and that oddly didn't seem to help at all (any thoughts on why?). It tends to treat all the thefts at a particular station name as one topic. If I reduce the number of topics I get too much noise --- other unrelated crimes mixed in.
One idea I had was to try having BERTopic look only at nouns. Is there a way to tell it to do that? Any other ideas? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions