-
Notifications
You must be signed in to change notification settings - Fork 32
🤖 fix: enable prompt caching for mux-gateway Anthropic models #816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The gateway provider sends requests in SDK internal format (json.prompt) rather than Anthropic API format (json.messages). Additionally, the gateway server only translates providerOptions.anthropic.cacheControl at the message level, not at content part level. This fix: 1. Detects gateway format by checking for json.prompt array 2. Adds providerOptions.anthropic.cacheControl at message level for gateway 3. Keeps cache_control injection at content part level for direct Anthropic Without this fix, only system messages were cached (5.4k) because they already had message-level providerOptions. Conversation history was not cached because applyCacheControl() sets providerOptions at content part level, which the gateway server ignores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // For direct Anthropic: add cache_control to last content part | ||
| const content = lastMsg.content; | ||
| if (Array.isArray(content) && content.length > 0) { | ||
| // Array content: add cache_control to last part | ||
| const lastPart = content[content.length - 1] as Record<string, unknown>; | ||
| lastPart.cache_control ??= { type: "ephemeral" }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid adding cache_control to gateway prompts
Gateway Anthropic requests use the AI SDK json.prompt schema and the function notes the gateway rejects raw cache_control fields, yet this block still injects cache_control into the last prompt content part when messages resolves from json.prompt. That means gateway chat requests with array content will now carry Anthropic-specific fields the gateway schema doesn’t accept, leading to 400/validation errors instead of enabling caching for those calls.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not true, this function is only called for Anthropic
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
Problem
Prompt caching wasn't working for Anthropic models accessed via mux-gateway. Only the system message (~5.4k tokens) was being cached, but conversation history was not.
Root Cause
The gateway provider uses a different request format and translation mechanism:
json.messagesproviderOptionsat all levelsjson.promptproviderOptionsat message levelOur
applyCacheControl()was settingproviderOptions.anthropic.cacheControlat the content part level, which the gateway server ignores. OnlycreateCachedSystemMessage()was setting it at message level (hence system prompt caching worked).Fix
Update
wrapFetchWithAnthropicCacheControlto:json.promptarrayproviderOptions.anthropic.cacheControlat message level for gateway requestscache_controlinjection at content part level for direct AnthropicTesting
Verified caching works by checking that cache read tokens increase on subsequent messages.
Generated with
mux