Capture number of tokens in a request and response when possible #373

jwmatthews · 2024-09-17T19:34:59Z

We've run into a few situations where it would benefit us if we had a better view of the number of tokens consumed in a request and response.

Let's augment the data we are capturing for tracing and add in any extra info we may get back from the LLM via 'response_metadata'.

Current understanding is that for some models, the response includes metadata that breaks out the number of tokens used in the request and the response.

jwmatthews · 2024-09-17T19:39:21Z

@devjpt23 has begun to work on this issue. I wasn't yet able to formally assign him to this issue.

Looks like I can only assign issues to folks in the Konveyor Org, so formed a new team of 'Collaborators' and invited @devjpt23 to that so he can be assigned future issues.

jwmatthews · 2024-09-20T19:23:26Z

#375 adds the ability to log token request/response usage on successful calls for some models that send back a 'token_usage' in response metadata.

We would like to extend the capability beyond what #375 offers.

Pre-compute a guess at the tokens consumed in a prompt, prior to sending and log it.
On a failure response, check if there is any response metadata on token usage we can find
Explore other providers that are not showing token usage as per Add metadata to tracing #375, Amazon Bedrock is one provider which didn't log token data as per Add metadata to tracing #375

jwmatthews · 2025-01-22T16:55:49Z

Let's leave this open, we need someone to dive back into this area and assess we have not lost this ability after the refactoring to agentic workflows.

We are tracking #568 to have someone go back over guidance for troubleshooting/trace information with the new agentic workflows, this is a good contender to add to that task

shawn-hurley · 2025-01-22T17:04:40Z

removing from next up then, as I think if it is a task of the other one or highly related, then it shouldn't be in next up, if you disagree let me know

JonahSussman assigned devjpt23 Sep 19, 2024

shawn-hurley added priority/nextup Issues we want to address soon agent rpc-server labels Jan 13, 2025

shawn-hurley mentioned this issue Jan 13, 2025

Add debug info to help us understand the size in tokens of our prompts and responses #137

Open

dymurray added this to Kai Tech Preview Release Jan 21, 2025

dymurray added this to the v0.1.0 milestone Jan 21, 2025

jwmatthews mentioned this issue Jan 22, 2025

Update debugging/troublshooting, and trace information to reflect recent work #568

Open

shawn-hurley removed the priority/nextup Issues we want to address soon label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture number of tokens in a request and response when possible #373

Capture number of tokens in a request and response when possible #373

jwmatthews commented Sep 17, 2024

jwmatthews commented Sep 17, 2024

jwmatthews commented Sep 20, 2024

jwmatthews commented Jan 22, 2025

shawn-hurley commented Jan 22, 2025

Capture number of tokens in a request and response when possible #373

Capture number of tokens in a request and response when possible #373

Comments

jwmatthews commented Sep 17, 2024

jwmatthews commented Sep 17, 2024

jwmatthews commented Sep 20, 2024

jwmatthews commented Jan 22, 2025

shawn-hurley commented Jan 22, 2025