Added seed phrases to KeyNMF #77

x-tabdeveloping · 2025-01-31T13:34:11Z

You can now add a seed_phrase to a KeyNMF model, which essentially indicates the aspect, from which the model has to examine documents.

from sklearn.datasets import fetch_20newsgroups

from turftopic import KeyNMF

corpus = fetch_20newsgroups(
    subset="all",
    remove=("headers", "footers", "quotes"),
).data

model = KeyNMF(5, seed_phrase="Is homosexuality moral?")
model.fit(corpus)

model.print_topics()

Topic ID	Highest Ranking
0	homosexuality, homosexual, immoral, sodom, heterosexual, sexual, fornication, christians, verses, sex
1	morality, moral, immoral, morals, objective, morally, society, animals, behavior, natural
2	christians, christian, christianity, religion, bible, god, church, religious, faith, beliefs
3	homosexual, homosexuals, heterosexual, gay, sexual, sex, heterosexuals, straight, men, sexuality
4	sin, sins, god, sinner, sinful, condemnation, sinned, scripture, punishment, sinners

TODO:

Add documentation
Release new version

x-tabdeveloping · 2025-02-10T14:05:07Z

@KennethEnevoldsen can I has review?

KennethEnevoldsen

Looking great a few ideas to restructure the docs - nothing holding back this PR though

KennethEnevoldsen · 2025-02-10T14:49:34Z

README.md

@@ -20,42 +20,26 @@
 - Lemmatization and Stemming
 - Visualization with [topicwizard](https://github.com/x-tabdeveloping/topicwizard) 🖌️

+## New in version 0.12.0: Seeded topic modeling


Seems like you should really keep a changelog (to many important tidbits in these that people likely miss out on)

You could potentially do it in a dropdown menu ("See previous versions (click to unfold)")

KennethEnevoldsen · 2025-02-10T14:50:49Z

docs/KeyNMF.md

@@ -8,20 +8,30 @@ while taking inspiration from classical matrix-decomposition approaches for extr
  <figcaption>Schematic overview of KeyNMF</figcaption>
 </figure>

+
 Here's an example of how you can fit and interpret a KeyNMF model in the easiest way.


Suggested change

Here's an example of how you can fit and interpret a KeyNMF model in the easiest way.

Here's an example of how you can fit and interpret a KeyNMF model.

KennethEnevoldsen · 2025-02-10T14:52:33Z

docs/KeyNMF.md

 model.fit(corpus)

 model.print_topics()
 ```

+!!! question "Which Embedding model should I use"
+    - You should probably use KeyNMF with a `paraphrase-` type embedding model. These seem to perform best in most tasks. Some examples include:
+        - [paraphrase-MiniLM-L3-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L3-v2) - Absolutely tiny :mouse: 


I would focus on speed, not all will know that size and speed are related.

Seems like this is a bit redundant if you can simply use the static-retrieval-mrl-en-v1?

KennethEnevoldsen · 2025-02-10T14:56:01Z

docs/KeyNMF.md

+In KeyNMF, you can describe this aspect, from which you want to investigate your corpus, using a free-text seed-phrase,
+which will then be used to only extract topics, which are relevant to your research question.


Has this idea been explored before? If so a reference would be great

KennethEnevoldsen · 2025-02-10T14:59:02Z

docs/KeyNMF.md

@@ -354,46 +424,49 @@ for batch in batched(zip(corpus, timestamps)):
    model.partial_fit_dynamic(text_batch, timestamps=ts_batch, bins=bins)
 ```

-### Hierarchical Topic Modeling
+## Asymmetric and Instruction-tuned Embedding Models


Some of these things are specifically in the KeyNMF docs, why not put them in a general section?

KennethEnevoldsen · 2025-02-10T15:02:18Z

docs/KeyNMF.md


-### Asymmetric and Instruction-tuned Embedding Models
+## Seeded Topic Modeling


Do you need to refer to the general documentation of seeded topic modeling here (seems like a lot of duplication).

Might it not be better to create a table like "supported types of topic modelling" and then in the "Seeded Topic Modelling" section add "Models which support seeded topic modelling".

KennethEnevoldsen · 2025-02-10T15:04:55Z

docs/seeded.md

+Some models are able to account for this by taking seed phrases or words.
+This is currently only possible with KeyNMF in Turftopic, but will likely be extended in the future.
+
+In [KeyNMF](../keynmf.md), you can describe the aspect, from which you want to investigate your corpus, using a free-text seed-phrase,


See the comment above as well.

Would probably write this more simply and then put this in a Tab section called KeyNMF (that way it is easy to see that the only one supported in KeyNMF, but also that there could be others in the future.

KennethEnevoldsen · 2025-02-10T15:05:52Z

turftopic/models/_keynmf.py

@@ -120,6 +120,8 @@ def batch_extract_keywords(
        self,
        documents: list[str],
        embeddings: Optional[np.ndarray] = None,
+        seed_embedding: Optional[np.ndarray] = None,
+        fitting: bool = True,


What fitting do?

x-tabdeveloping added 6 commits January 31, 2025 14:29

Added seed phrases to KeyNMF

e282104

Fixed transform() in KeyNMF

560682e

Updated docs in KeyNMF

c66ec54

Added doc page for seeded modeling

12402f9

Updated readme

bb55d15

Bumped version

b22acdc

x-tabdeveloping requested a review from KennethEnevoldsen February 1, 2025 11:41

x-tabdeveloping changed the title ~~WIP: Added seed phrases to KeyNMF~~ Added seed phrases to KeyNMF Feb 1, 2025

KennethEnevoldsen reviewed Feb 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added seed phrases to KeyNMF #77

Added seed phrases to KeyNMF #77

x-tabdeveloping commented Jan 31, 2025 •

edited

Loading

x-tabdeveloping commented Feb 10, 2025

KennethEnevoldsen left a comment

KennethEnevoldsen Feb 10, 2025

KennethEnevoldsen Feb 10, 2025

KennethEnevoldsen Feb 10, 2025

KennethEnevoldsen Feb 10, 2025

KennethEnevoldsen Feb 10, 2025

KennethEnevoldsen Feb 10, 2025

KennethEnevoldsen Feb 10, 2025

KennethEnevoldsen Feb 10, 2025

KennethEnevoldsen Feb 10, 2025

	Here's an example of how you can fit and interpret a KeyNMF model in the easiest way.
	Here's an example of how you can fit and interpret a KeyNMF model.

		In KeyNMF, you can describe this aspect, from which you want to investigate your corpus, using a free-text seed-phrase,
		which will then be used to only extract topics, which are relevant to your research question.


		### Asymmetric and Instruction-tuned Embedding Models
		## Seeded Topic Modeling

Added seed phrases to KeyNMF #77

Are you sure you want to change the base?

Added seed phrases to KeyNMF #77

Conversation

x-tabdeveloping commented Jan 31, 2025 • edited Loading

TODO:

x-tabdeveloping commented Feb 10, 2025

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

x-tabdeveloping commented Jan 31, 2025 •

edited

Loading