Add prompt caching support for Claude. #226

montebrown · 2025-01-07T23:40:07Z

both ChatGPT and Claude offer prefix-based prompt caching, however, unlike ChatGPT, prompt caching for Claude is not automatic.
Claude will cache tokens up until a block marked with 'cache_control'. This can include tools, system, and messages, in that order.
setting cache_control can now be done at the ContentPart level by setting the :cache_control option to true or %{type: "ephemeral"}. See ChatAnthropicTest for an example.

Notes:

this change means that system messages can contain a list of ContentParts now, so that cache_control can be set in the middle of a system message.
this does not address collecting the usage metrics - when caching is used, cache_creation_input_tokens and cache_read_input_tokens are returned in the usage metadata in addition to the usual input_tokens and output_tokens.

- both ChatGPT and Claude offer prefix-based prompt caching, however, unlike ChatGPT, prompt caching for Claude is not automatic. - Claude will cache tokens up until a block marked with 'cache_control'. This can include tools, system, and messages, in that order. - setting cache_control can now be done at the ContentPart level by setting the :cache_control option to true or %{type: "ephemeral"}. See ChatAnthropicTest for an example.

* main: feat: Support for Ollama keep_alive API parameter (brainlid#237) support for o1 OpenAI model (brainlid#234) minor test cleanup phi_4 chat template support fix after merge feat: apply chat template from callback (brainlid#231) Add Bumblebee Phi-4 (brainlid#233) updated changelog update version and docs outline (brainlid#229) fix: enable verbose_deltas (brainlid#197) feat: Enable :inet6 for Req.new (brainlid#227) Breaking change: consolidate LLM callback functions (brainlid#228)

brainlid · 2025-01-22T19:37:43Z

Thanks @montebrown!

I appreciate the update to the docs and tests. Great enhancement!
❤️💛💙💜

brainlid · 2025-01-22T19:43:42Z

@montebrown,

Heads up that this PR #236 would expose the additional custom token usage information for cached parts.

montebrown and others added 2 commits January 7, 2025 18:35

brainlid merged commit c2b22c4 into brainlid:main Jan 22, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt caching support for Claude. #226

Add prompt caching support for Claude. #226

montebrown commented Jan 7, 2025

brainlid commented Jan 22, 2025

brainlid commented Jan 22, 2025

Add prompt caching support for Claude. #226

Add prompt caching support for Claude. #226

Conversation

montebrown commented Jan 7, 2025

brainlid commented Jan 22, 2025

brainlid commented Jan 22, 2025