Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip existing embeddings #2017

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

shanbady
Copy link
Contributor

@shanbady shanbady commented Feb 6, 2025

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/6670

Description (What does it do?)

This PR makes it so that the embedding generation for both learning resources and contentfiles skips over existing embeddings by default. We can force re-embed by passing the --overwrite flag to the generate_embeddngs command.

How can this be tested?

  1. checkout this branch
  2. restart celery
    3.get the id of learning resource that has associated contentfiles and pass it into the generate_embeddings command python manage.py generate_embeddings --resource-ids <the id>
  3. observe the celery container's output - there should be some output indicating that embeddings are being generated (different output depending on if fastembed or ollama via litellm is used).
  4. Once that completes, verify there are embeddings on the qdrant dashboard
  5. while keeping an eye on the celery output, re-run the embedding command
  6. note that embeddings are not skipped
  7. re-run the command with the overwrite flag python manage.py generate_embeddings --resource-ids <the id> --overwrite and note that once again embeddings are getting generated

@shanbady shanbady changed the title Shanbady/skip existing cf embeddings Skip existing embeddings Feb 6, 2025
@shanbady shanbady added Work in Progress Needs Review An open Pull Request that is ready for review and removed Work in Progress labels Feb 6, 2025
@shanbady shanbady marked this pull request as ready for review February 6, 2025 19:27
@rhysyngsun rhysyngsun self-assigned this Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Review An open Pull Request that is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants