Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to calculate entropy with Bertopic #2186

Open
1 task done
bruceszq opened this issue Oct 16, 2024 · 1 comment
Open
1 task done

How to calculate entropy with Bertopic #2186

bruceszq opened this issue Oct 16, 2024 · 1 comment
Labels
question Further information is requested

Comments

@bruceszq
Copy link

bruceszq commented Oct 16, 2024

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Desribe the bug

Hi everyone, first of all I would like to thank @MaartenGr and all the contributors for this amazing project.
For my project, I need to calculate the entropy of each topic. Could you help me how to calculate entropy in Bertopic. I have used probs to calculate, but the bug showed that the probs were 1 dimension array. But my code requires two dimension array. Thank you very much!

Reproduction

import numpy as np
import pandas as pd

doc_topic_matrix = np.array(probs)

normalized_doc_topic_matrix = doc_topic_matrix / doc_topic_matrix.sum(axis=1, keepdims=True)

topic_entropy = (-normalized_doc_topic_matrix * np.log2(normalized_doc_topic_matrix + 1e-9)).sum(axis=0)

entropy_df = pd.DataFrame({'Topic': range(len(topic_entropy)), 'Entropy': topic_entropy})

topic_freq['Entropy'] = sorted_entropy_df['Entropy'].values

BERTopic Version

0.16.4

@bruceszq bruceszq added the bug Something isn't working label Oct 16, 2024
@MaartenGr
Copy link
Owner

In order to get 2-dimensional probabilities, you would need to set calculate_probabilities=True when initialization BERTopic.

@MaartenGr MaartenGr added question Further information is requested and removed bug Something isn't working labels Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants