Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prompt caching support for Claude. #226

Merged
merged 2 commits into from
Jan 22, 2025
Merged

Conversation

montebrown
Copy link
Contributor

  • both ChatGPT and Claude offer prefix-based prompt caching, however, unlike ChatGPT, prompt caching for Claude is not automatic.
  • Claude will cache tokens up until a block marked with 'cache_control'. This can include tools, system, and messages, in that order.
  • setting cache_control can now be done at the ContentPart level by setting the :cache_control option to true or %{type: "ephemeral"}. See ChatAnthropicTest for an example.

Notes:

  • this change means that system messages can contain a list of ContentParts now, so that cache_control can be set in the middle of a system message.
  • this does not address collecting the usage metrics - when caching is used, cache_creation_input_tokens and cache_read_input_tokens are returned in the usage metadata in addition to the usual input_tokens and output_tokens.

montebrown and others added 2 commits January 7, 2025 18:35
 - both ChatGPT and Claude offer prefix-based prompt caching, however, unlike ChatGPT, prompt caching for Claude is not automatic. 
 - Claude will cache tokens up until a block marked with 'cache_control'. This can include tools, system, and messages, in that order.  
 - setting cache_control can now be done at the ContentPart level by setting the :cache_control option to true or %{type: "ephemeral"}. See ChatAnthropicTest for an example.
* main:
  feat: Support for Ollama keep_alive API parameter (brainlid#237)
  support for o1 OpenAI model (brainlid#234)
  minor test cleanup
  phi_4 chat template support fix after merge
  feat: apply chat template from callback (brainlid#231)
  Add Bumblebee Phi-4 (brainlid#233)
  updated changelog
  update version and docs outline (brainlid#229)
  fix: enable verbose_deltas (brainlid#197)
  feat: Enable :inet6 for Req.new (brainlid#227)
  Breaking change: consolidate LLM callback functions (brainlid#228)
@brainlid
Copy link
Owner

Thanks @montebrown!

I appreciate the update to the docs and tests. Great enhancement!
❤️💛💙💜

@brainlid brainlid merged commit c2b22c4 into brainlid:main Jan 22, 2025
1 check passed
@brainlid
Copy link
Owner

@montebrown,

Heads up that this PR #236 would expose the additional custom token usage information for cached parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants