Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LLM APIs responses in token counting #5604

Merged
merged 11 commits into from
Feb 23, 2025
Merged

Use LLM APIs responses in token counting #5604

merged 11 commits into from
Feb 23, 2025

Conversation

enyst
Copy link
Collaborator

@enyst enyst commented Dec 14, 2024

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR proposes to use the liteLLM token usage data explicitly.

  • track it in Metrics
  • add two utility methods to link it to events

Please note: technically we also have it in the tool_call_metadata optional field in Event, because we have the entire litellm ModelResponse there. This PR proposes to take it out of a ModelResponse, for a few reasons:

  • it's not always there: MessageActions, some FinishActions, and user actions don't have a tool. We can fill the data for those from elsewhere (e.g. tokenizer on the corresponding message)
  • it seems easier to work with, in a format we need, than the format that litellm returns, though I could be wrong
  • we may want to remove the full ModelResponse from events sometime, it contains a lot we don't really need. Events are supposed to be sort of simple and higher level, while the technical details of an API response in litellm are quite complex and of a different nature.

Part of #6707

Follow-up on #5550

Link of any specific issues this addresses
Fix #2947


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:9d3f77f-nikolaik   --name openhands-app-9d3f77f   docker.all-hands.dev/all-hands-ai/openhands:9d3f77f

@enyst enyst marked this pull request as draft December 14, 2024 20:25
@enyst
Copy link
Collaborator Author

enyst commented Dec 15, 2024

@openhands-agent Read the diff of this PR, PR 5604. And add unit tests for the functionality we change or introduce.

Please make sure to explore the unit tests a bit to find if we already have test files appropriate for these tests, if not make a new one.

@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Dec 15, 2024
@enyst enyst added the lint-fix label Dec 15, 2024
enyst pushed a commit to enyst/playground that referenced this pull request Dec 15, 2024
@enyst enyst added lint-fix and removed lint-fix labels Dec 15, 2024
@enyst enyst self-assigned this Dec 15, 2024
@enyst enyst added lint-fix and removed lint-fix labels Dec 15, 2024
@enyst enyst changed the title Make use of litellm 'Usage' data for token counting Use LLM APIs responses in token counting Dec 17, 2024
@enyst

This comment was marked as outdated.

@openhands-agent

This comment was marked as outdated.

@openhands-agent

This comment was marked as outdated.

@enyst

This comment was marked as outdated.

@openhands-agent

This comment was marked as outdated.

@openhands-agent

This comment was marked as outdated.

@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Jan 21, 2025
@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Jan 21, 2025
@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Jan 21, 2025
@All-Hands-AI All-Hands-AI deleted a comment from openhands-agent Jan 21, 2025
@mamoodi
Copy link
Collaborator

mamoodi commented Feb 10, 2025

@enyst this is still in flux?

(
usage
for usage in metrics.tokens_usages
if usage.response_id == response_id
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation ignores that model_response might have the tokens data and takes it from TokenUsage instead. We will likely always need response_id, but maybe not all the rest in model_response.

@enyst enyst marked this pull request as ready for review February 22, 2025 20:20
@enyst enyst requested a review from csmith49 February 22, 2025 20:20
@@ -17,18 +17,31 @@ class ResponseLatency(BaseModel):
response_id: str


class TokensUsage(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, but I'd expect this to be called TokenUsage instead of the pluralized form.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, I felt the same, it reads strange 😅 , o3-mini wanted it for some mysterious reason. Fixed!

They are going to align humans to their preferences, aren't they? 😂

Copy link
Collaborator

@csmith49 csmith49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Super useful to be able to grab token usage info from events after the fact. One question: we link token usage metrics to events by going through the response ID in the tool-call metadata. Do all events have that metadata, even if they're just a user message?

@enyst
Copy link
Collaborator Author

enyst commented Feb 23, 2025

LGTM! Super useful to be able to grab token usage info from events after the fact. One question: we link token usage metrics to events by going through the response ID in the tool-call metadata. Do all events have that metadata, even if they're just a user message?

No, unfortunately we don't have it for MessageActions, FinishActions (started from user at least) and any user actions (like bash commands that the user writes in the UI terminal). One of the utility methods tries to find some info by looking back in history, at the last event that had it. It will be better perhaps to try to fill it in ourselves from other sources (e.g. tokenizer or for Anthropic its API has an endpoint returning token counts). 🤔

@enyst enyst merged commit 2d2dbf1 into main Feb 23, 2025
15 checks passed
@enyst enyst deleted the enyst/usage branch February 23, 2025 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feat: make use of litellm's response "usage" data
4 participants