Skip to content

Implemented KVCacheAwareScorer #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

vMaroon
Copy link

@vMaroon vMaroon commented Apr 19, 2025

Summary

Implemented KVCacheAwareScorer which uses kvcache.Indexer as an embedded module.
The latter is configured with default configuration + an env settable Redis server address, and a HuggingFace token which is an acceptable practice.

Changes:

  • This PR introduces some changes to the scheduler flow since this is the first scorer that has to be prompt-aware.
  • This PR also introduces minor code refactoring to the previous work.
  • The Dockerfile was changed:
    • An implication of embedding the kvcache.Indexer directly is having to statically link the HuggingFace Rust tokenizer's Go bindings.
    • Due to the kvcache-manager repo being internal, having Go download its modules requires configuring access. Therefore GIT_NM_USER and NM_TOKEN have to be set.
  • The Makefile's image-build target was slightly refactored and updated to support the Dockerfile changes, and building from non amd64 archs.

Follow-up work:

  • Scorers configuration

@vMaroon vMaroon changed the base branch from kvcache-aware to dev April 23, 2025 21:59
@vMaroon
Copy link
Author

vMaroon commented Apr 23, 2025

The force-push rebases onto dev and reimplements the KVCacheScorer along with the new missing logic.

@vMaroon vMaroon force-pushed the kvcache-aware branch 2 times, most recently from cb10c4b to 83a91a6 Compare April 24, 2025 15:45
@vMaroon
Copy link
Author

vMaroon commented Apr 24, 2025

The force push undos a nolint directive that failed to fix the buggy CI linting. The linting error makes no sense, I think we can ignore it or followup later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants