-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added seed phrases to KeyNMF #77
base: main
Are you sure you want to change the base?
Conversation
@KennethEnevoldsen can I has review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great a few ideas to restructure the docs - nothing holding back this PR though
@@ -20,42 +20,26 @@ | |||
- Lemmatization and Stemming | |||
- Visualization with [topicwizard](https://github.com/x-tabdeveloping/topicwizard) 🖌️ | |||
|
|||
## New in version 0.12.0: Seeded topic modeling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you should really keep a changelog (to many important tidbits in these that people likely miss out on)
You could potentially do it in a dropdown menu ("See previous versions (click to unfold)")
@@ -8,20 +8,30 @@ while taking inspiration from classical matrix-decomposition approaches for extr | |||
<figcaption>Schematic overview of KeyNMF</figcaption> | |||
</figure> | |||
|
|||
|
|||
Here's an example of how you can fit and interpret a KeyNMF model in the easiest way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of how you can fit and interpret a KeyNMF model in the easiest way. | |
Here's an example of how you can fit and interpret a KeyNMF model. |
model.fit(corpus) | ||
|
||
model.print_topics() | ||
``` | ||
|
||
!!! question "Which Embedding model should I use" | ||
- You should probably use KeyNMF with a `paraphrase-` type embedding model. These seem to perform best in most tasks. Some examples include: | ||
- [paraphrase-MiniLM-L3-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L3-v2) - Absolutely tiny :mouse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would focus on speed, not all will know that size and speed are related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is a bit redundant if you can simply use the static-retrieval-mrl-en-v1?
In KeyNMF, you can describe this aspect, from which you want to investigate your corpus, using a free-text seed-phrase, | ||
which will then be used to only extract topics, which are relevant to your research question. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this idea been explored before? If so a reference would be great
@@ -354,46 +424,49 @@ for batch in batched(zip(corpus, timestamps)): | |||
model.partial_fit_dynamic(text_batch, timestamps=ts_batch, bins=bins) | |||
``` | |||
|
|||
### Hierarchical Topic Modeling | |||
## Asymmetric and Instruction-tuned Embedding Models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these things are specifically in the KeyNMF docs, why not put them in a general section?
|
||
### Asymmetric and Instruction-tuned Embedding Models | ||
## Seeded Topic Modeling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to refer to the general documentation of seeded topic modeling here (seems like a lot of duplication).
Might it not be better to create a table like "supported types of topic modelling" and then in the "Seeded Topic Modelling" section add "Models which support seeded topic modelling".
Some models are able to account for this by taking seed phrases or words. | ||
This is currently only possible with KeyNMF in Turftopic, but will likely be extended in the future. | ||
|
||
In [KeyNMF](../keynmf.md), you can describe the aspect, from which you want to investigate your corpus, using a free-text seed-phrase, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment above as well.
Would probably write this more simply and then put this in a Tab section called KeyNMF (that way it is easy to see that the only one supported in KeyNMF, but also that there could be others in the future.
@@ -120,6 +120,8 @@ def batch_extract_keywords( | |||
self, | |||
documents: list[str], | |||
embeddings: Optional[np.ndarray] = None, | |||
seed_embedding: Optional[np.ndarray] = None, | |||
fitting: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What fitting do?
You can now add a
seed_phrase
to a KeyNMF model, which essentially indicates the aspect, from which the model has to examine documents.TODO: