Skip to content

Feat/anthropic extended ttl #6205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions core/index.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -927,6 +927,13 @@ export interface RequestOptions {
export interface CacheBehavior {
cacheSystemMessage?: boolean;
cacheConversation?: boolean;
useExtendedCacheTtlBeta?: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm coming at this review with the lens of "if we add it now, we'll have to support it forever (or go through a deliberate deprecation process)". I'm worried there are a large number of options here that aren't going to be relevant forever, or that they might not be the final form of this configuration.

It would be helpful to better understand whether all of the cacheUserMessages, cacheAssistantMessages, etc. are truly necessary for people to customize, or whether we just need to set a more sensible default. For example, I'd be curious what values you set here and whether you think we should just ship those as the defaults for everyone. Usage patterns in Continue are probably similar across a variety of users. Not that we couldn't eventually also allow this customization, but it might save a lot of maintenance (and give many users money back without needing to configure anything)

cacheTtl?: string;
// Enhanced per-type caching options
cacheUserMessages?: number; // Number of recent user messages to cache (default: 2)
cacheAssistantMessages?: number; // Number of recent assistant messages to cache (default: 0)
cacheToolResults?: number; // Number of recent tool result messages to cache (default: 0)
cacheAssistantToolCalls?: number; // Number of recent assistant tool call messages to cache (default: 0)
}

export interface ClientCertificateOptions {
Expand Down
148 changes: 128 additions & 20 deletions core/llm/llms/Anthropic.ts
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ class Anthropic extends BaseLLM {
return finalOptions;
}

private convertMessage(message: ChatMessage, addCaching: boolean): any {
protected convertMessage(message: ChatMessage, addCaching: boolean): any {
if (message.role === "tool") {
return {
role: "user",
Expand All @@ -57,17 +57,39 @@ class Anthropic extends BaseLLM {
type: "tool_result",
tool_use_id: message.toolCallId,
content: renderChatMessage(message) || undefined,
// Add caching support for tool results
...(addCaching
? {
cache_control: this.cacheBehavior?.useExtendedCacheTtlBeta
? {
type: "ephemeral",
ttl: this.cacheBehavior?.cacheTtl ?? "5m",
}
: { type: "ephemeral" },
}
: {}),
},
],
};
} else if (message.role === "assistant" && message.toolCalls) {
return {
role: "assistant",
content: message.toolCalls.map((toolCall) => ({
content: message.toolCalls.map((toolCall, index) => ({
type: "tool_use",
id: toolCall.id,
name: toolCall.function?.name,
input: safeParseToolCallArgs(toolCall),
// Add caching support for assistant tool calls (last tool call only)
...(addCaching && index === message.toolCalls!.length - 1
? {
cache_control: this.cacheBehavior?.useExtendedCacheTtlBeta
? {
type: "ephemeral",
ttl: this.cacheBehavior?.cacheTtl ?? "5m",
}
: { type: "ephemeral" },
}
: {}),
})),
};
} else if (message.role === "thinking" && !message.redactedThinking) {
Expand Down Expand Up @@ -100,7 +122,16 @@ class Anthropic extends BaseLLM {
{
type: "text",
text: message.content,
...(addCaching ? { cache_control: { type: "ephemeral" } } : {}),
...(addCaching
? {
cache_control: this.cacheBehavior?.useExtendedCacheTtlBeta
? {
type: "ephemeral",
ttl: this.cacheBehavior?.cacheTtl ?? "5m",
}
: { type: "ephemeral" },
}
: {}),
},
],
};
Expand All @@ -115,7 +146,14 @@ class Anthropic extends BaseLLM {
...part,
// If multiple text parts, only add cache_control to the last one
...(addCaching && contentIdx === message.content.length - 1
? { cache_control: { type: "ephemeral" } }
? {
cache_control: this.cacheBehavior?.useExtendedCacheTtlBeta
? {
type: "ephemeral",
ttl: this.cacheBehavior?.cacheTtl ?? "5m",
}
: { type: "ephemeral" },
}
: {}),
};
return newpart;
Expand All @@ -132,24 +170,87 @@ class Anthropic extends BaseLLM {
};
}

protected shouldCacheMessage(
message: ChatMessage,
index: number,
filteredMessages: ChatMessage[],
): boolean {
if (!this.cacheBehavior?.cacheConversation) {
return false;
}

const {
cacheUserMessages = 2,
cacheAssistantMessages = 0,
cacheToolResults = 0,
cacheAssistantToolCalls = 0,
} = this.cacheBehavior;

switch (message.role) {
case "user":
if (cacheUserMessages > 0) {
const userMessages = filteredMessages.filter(
(m) => m.role === "user",
);
const userIndex = userMessages.findIndex((m) => m === message);
return userIndex >= userMessages.length - cacheUserMessages;
}
break;

case "assistant":
if (message.toolCalls && cacheAssistantToolCalls > 0) {
const assistantToolMessages = filteredMessages.filter(
(m) => m.role === "assistant" && m.toolCalls,
);
const assistantIndex = assistantToolMessages.findIndex(
(m) => m === message,
);
return (
assistantIndex >=
assistantToolMessages.length - cacheAssistantToolCalls
);
} else if (!message.toolCalls && cacheAssistantMessages > 0) {
const assistantMessages = filteredMessages.filter(
(m) => m.role === "assistant" && !m.toolCalls,
);
const assistantIndex = assistantMessages.findIndex(
(m) => m === message,
);
return (
assistantIndex >= assistantMessages.length - cacheAssistantMessages
);
}
break;

case "tool":
if (cacheToolResults > 0) {
const toolMessages = filteredMessages.filter(
(m) => m.role === "tool",
);
const toolIndex = toolMessages.findIndex((m) => m === message);
return toolIndex >= toolMessages.length - cacheToolResults;
}
break;
}

return false;
}

public convertMessages(msgs: ChatMessage[]): any[] {
// should be public for use within VertexAI
const filteredmessages = msgs.filter(
(m) => m.role !== "system" && !!m.content,
(m) =>
m.role !== "system" &&
(!!m.content || (m.role === "assistant" && m.toolCalls)),
);
const lastTwoUserMsgIndices = filteredmessages
.map((msg, index) => (msg.role === "user" ? index : -1))
.filter((index) => index !== -1)
.slice(-2);

const messages = filteredmessages.map((message, filteredMsgIdx) => {
// Add cache_control parameter to the last two user messages
// The second-to-last because it retrieves potentially already cached contents,
// The last one because we want it cached for later retrieval.
// See: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
const addCaching =
this.cacheBehavior?.cacheConversation &&
lastTwoUserMsgIndices.includes(filteredMsgIdx);
// Enhanced caching logic that supports tool messages
const addCaching = this.shouldCacheMessage(
message,
filteredMsgIdx,
filteredmessages,
);

const chatMessage = this.convertMessage(message, !!addCaching);
return chatMessage;
Expand Down Expand Up @@ -194,9 +295,11 @@ class Anthropic extends BaseLLM {
Accept: "application/json",
"anthropic-version": "2023-06-01",
"x-api-key": this.apiKey as string,
...(shouldCacheSystemMessage || this.cacheBehavior?.cacheConversation
? { "anthropic-beta": "prompt-caching-2024-07-31" }
: {}),
...(this.cacheBehavior?.useExtendedCacheTtlBeta
? { "anthropic-beta": "extended-cache-ttl-2025-04-11" }
: shouldCacheSystemMessage || this.cacheBehavior?.cacheConversation
? { "anthropic-beta": "prompt-caching-2024-07-31" }
: {}),
},
body: JSON.stringify({
...this.convertArgs(options),
Expand All @@ -206,7 +309,12 @@ class Anthropic extends BaseLLM {
{
type: "text",
text: systemMessage,
cache_control: { type: "ephemeral" },
cache_control: this.cacheBehavior?.useExtendedCacheTtlBeta
? {
type: "ephemeral",
ttl: this.cacheBehavior?.cacheTtl ?? "5m",
}
: { type: "ephemeral" },
},
]
: systemMessage,
Expand Down
Loading
Loading