Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index pointer of each token within doc_freq #80

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eramirem
Copy link

@eramirem eramirem commented Aug 8, 2024

Problem

If we want to use the output of BM25Encoder._encode_single_document to form a scipy.sparse array, we cannot use the mmh3 hashes returned as indices, but we rather need the positions of the non-null keywords within the document frequency object.

Solution

Add a flag to specify whether you want the index positions to be returned by _encode_single_document

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Describe specific steps for validating this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant