Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Accurate Token Counting in LiteLLM with vLLM-Hosted Models (Gemma, Mistral, etc.) #8244

Open
thverney-dozo opened this issue Feb 4, 2025 · 0 comments
Labels
enhancement New feature or request mlops user request

Comments

@thverney-dozo
Copy link

thverney-dozo commented Feb 4, 2025

The Feature

vLLM (and similar backends like Ollama) provide dedicated routes to retrieve the exact token count for a given prompt, based on the tokenizer.json of the loaded model.

LiteLLM should be able to natively query the tokenizer used by vLLM (instead of defaulting to tiktoken).
Ideally, LiteLLM would extract tokenization logic directly from the vLLM-hosted model (when available), ensuring full compatibility with models like Gemma 2, Mistral, etc.
Alternatively, if vLLM exposes its token counting as an API route, LiteLLM could simply delegate token counting to vLLM when connected.

The only current workaround is using vLLM's own API instead of LiteLLM for token counting when running locally.

Otherwise thank you for your work :)

Motivation, pitch

I'm working with LiteLLM to centralise endpoints into one unique endpoint.

Currently, LiteLLM offers a /utils/token_counter route that can count tokens when plugged into a vLLM instance. However, this feature appears to have limited compatibility:

  • It works well for some models but defaults to tiktoken when unsupported, which is inaccurate for many architectures.
  • When using vLLM-hosted models like Gemma 2 or Mistral, the token count retrieved by LiteLLM can be inconsistent or incorrect, as these models often rely on specialized tokenization schemes.

Are you a ML Ops Team?

Yes

Twitter / LinkedIn details

No response

@thverney-dozo thverney-dozo added the enhancement New feature or request label Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mlops user request
Projects
None yet
Development

No branches or pull requests

1 participant