How to count used tokens? #538

rborosak · 2024-04-28T12:16:41Z

rborosak
Apr 28, 2024

Context / Scenario

I'm using Kernel memory as a plugin with Semantic Kernel.
Before each prompt I count the used tokens from chat history.

Question

I would like to know how can I get the count of the used tokens by the kernel memory plugin?

Answered by dluc

Jan 15, 2025

Update: starting from version 0.96, KM Ask API returns a Token Usage report. Hope this helps!

View full answer

dluc · 2024-04-30T04:55:40Z

dluc
Apr 30, 2024
Maintainer

The number of tokens is logged in the application logs. If you need to count upfront, you can use the provided tokenizers, e.g. we just added tiktoken counters that covers GPT2/GPT3/GPT4 models, and the repo contains also a LLama tokenizer for llama compatible models.

0 replies

rborosak · 2024-04-30T06:58:08Z

rborosak
Apr 30, 2024
Author

@dluc where can I find an example?
Thank you

0 replies

dluc · 2024-05-02T04:57:14Z

dluc
May 2, 2024
Maintainer

@rborosak for which model and/or tokenizer?

0 replies

gonzalocabo · 2024-05-08T15:20:33Z

gonzalocabo
May 8, 2024

@dluc For my implementation, I need to track the used tokens by the LLM. It's a web application, I need to record user info and tokens used to bill them after a period.

Reviewing the code the only approach is to use the Logger which counts the tokens with the Tokenizer, could you implement a Meter and trigger a metric or invoking a Event maybe?

0 replies

dluc · 2024-05-09T04:45:35Z

dluc
May 9, 2024
Maintainer

I'll see if there's an easy approach, not too expensive to develop. Can't promise though. We always welcome PRs, or draft PRs to kickstart the process if that might help.

0 replies

luismanez · 2024-05-09T14:13:14Z

luismanez
May 9, 2024

SemanticKernel has the concepts of FunctionFilters, where you can do things like that, so I guess KM should follow the same approach... which actually makes me wonder in KernelMemory should drop the AskAsync functionality, being "just" a Memory plugin (SearchAsync) for SK + all the Indexing capabilities... (don't kill me, just thinking out loud 😄 )

0 replies

dluc · 2024-05-09T19:36:24Z

dluc
May 9, 2024
Maintainer

KernelMemory should drop the AskAsync functionality, being "just" a Memory plugin (SearchAsync) for SK + all the Indexing capabilities... (don't kill me, just thinking out loud 😄 )

it's actually the other way around. SK memoy plugins need DB connectors to talk to storage. AskAsync is built on those connectors. KM connectors are an evolution of SK connectors, out of necessity of

supporting Security Filters
making Embedding Generation optional, for storage solutions like Azure AI Search, Postgres and Chroma that can generate embeddings internally
allowing Hybrid Search.

KM paved the way on the research side, so that these features can land also in SK ;-)

Anyway, if you're looking for a plugin here's KM plugin for SK: https://github.com/microsoft/kernel-memory/tree/main/clients/dotnet/SemanticKernelPlugin

0 replies

luismanez · 2024-05-10T07:40:11Z

luismanez
May 10, 2024

yeah, don't get me wrong, I love KM and we're using it in PROD (would be interesting for a case-study??). That said, the AskAsync method offers less possibilities than if the call to the model is done by SK: calling other plugins, handlebars templates, streaming...

0 replies

dluc · 2024-05-10T23:27:57Z

dluc
May 10, 2024
Maintainer

yeah, don't get me wrong, I love KM and we're using it in PROD (would be interesting for a case-study??). That said, the AskAsync method offers less possibilities than if the call to the model is done by SK: calling other plugins, handlebars templates, streaming...

(feedback appreciated, in both ways, no worries :-))

I wouldn't compare KM with an agent or a planner, or similar features based on function calling, if that makes sense. For instance, when using "function" calling, there are multiple functions that the LLM can choose from, and "Ask" is one of those functions.

Looking at SK memory classes, you can see there's a "Search" function, which can be used to put relevant information into a planner/agent context. Then one needs to create another function to leverage the context to answer questions or execute some actions, which IMO are out of "memory" scope.

Looking at KM, we could (would like to) extend the Ask method to do intent detection and decide how to process the user question, e.g. whether to search for relevant records or to process an entire document without the need for "search". However, in terms of "Memory" the scope of actions should be limited to retrieving information -- at least that's the idea of the primary API, and one can leverage the underlying orchestration to do more :-)

0 replies

dluc · 2024-06-04T19:19:48Z

dluc
Jun 4, 2024
Maintainer

Please feel free to use #532 to vote for this feature

0 replies

dluc · 2025-01-15T16:24:46Z

dluc
Jan 15, 2025
Maintainer

Update: starting from version 0.96, KM Ask API returns a Token Usage report. Hope this helps!

0 replies

How to count used tokens? #538

Uh oh!

rborosak Apr 28, 2024

Context / Scenario

Question

Replies: 11 comments

Uh oh!

dluc Apr 30, 2024 Maintainer

Uh oh!

rborosak Apr 30, 2024 Author

Uh oh!

dluc May 2, 2024 Maintainer

Uh oh!

gonzalocabo May 8, 2024

Uh oh!

dluc May 9, 2024 Maintainer

Uh oh!

luismanez May 9, 2024

Uh oh!

dluc May 9, 2024 Maintainer

Uh oh!

luismanez May 10, 2024

Uh oh!

Uh oh!

dluc May 10, 2024 Maintainer

Uh oh!

dluc Jun 4, 2024 Maintainer

Uh oh!

dluc Jan 15, 2025 Maintainer

rborosak
Apr 28, 2024

dluc
Apr 30, 2024
Maintainer

rborosak
Apr 30, 2024
Author

dluc
May 2, 2024
Maintainer

gonzalocabo
May 8, 2024

dluc
May 9, 2024
Maintainer

luismanez
May 9, 2024

dluc
May 9, 2024
Maintainer

luismanez
May 10, 2024

dluc
May 10, 2024
Maintainer

dluc
Jun 4, 2024
Maintainer

dluc
Jan 15, 2025
Maintainer