-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use LLM APIs responses in token counting #5604
Conversation
@openhands-agent Read the diff of this PR, PR 5604. And add unit tests for the functionality we change or introduce. Please make sure to explore the unit tests a bit to find if we already have test files appropriate for these tests, if not make a new one. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@enyst this is still in flux? |
( | ||
usage | ||
for usage in metrics.tokens_usages | ||
if usage.response_id == response_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation ignores that model_response might have the tokens data and takes it from TokenUsage instead. We will likely always need response_id, but maybe not all the rest in model_response.
openhands/llm/metrics.py
Outdated
@@ -17,18 +17,31 @@ class ResponseLatency(BaseModel): | |||
response_id: str | |||
|
|||
|
|||
class TokensUsage(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, but I'd expect this to be called TokenUsage
instead of the pluralized form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, I felt the same, it reads strange 😅 , o3-mini wanted it for some mysterious reason. Fixed!
They are going to align humans to their preferences, aren't they? 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Super useful to be able to grab token usage info from events after the fact. One question: we link token usage metrics to events by going through the response ID in the tool-call metadata. Do all events have that metadata, even if they're just a user message?
Co-authored-by: Calvin Smith <[email protected]>
No, unfortunately we don't have it for MessageActions, FinishActions (started from user at least) and any user actions (like bash commands that the user writes in the UI terminal). One of the utility methods tries to find some info by looking back in history, at the last event that had it. It will be better perhaps to try to fill it in ourselves from other sources (e.g. tokenizer or for Anthropic its API has an endpoint returning token counts). 🤔 |
End-user friendly description of the problem this fixes or functionality that this introduces
Give a summary of what the PR does, explaining any non-trivial design decisions
This PR proposes to use the liteLLM token usage data explicitly.
Please note: technically we also have it in the
tool_call_metadata
optional field in Event, because we have the entire litellm ModelResponse there. This PR proposes to take it out of a ModelResponse, for a few reasons:Part of #6707
Follow-up on #5550
Link of any specific issues this addresses
Fix #2947
To run this PR locally, use the following command: