Skip to content

Conversation

@ethanndickson
Copy link
Member

Problem

Prompt caching wasn't working for Anthropic models accessed via mux-gateway. Only the system message (~5.4k tokens) was being cached, but conversation history was not.

Root Cause

The gateway provider uses a different request format and translation mechanism:

Provider Request Format Cache Control Translation
Direct Anthropic json.messages SDK translates providerOptions at all levels
Gateway json.prompt Gateway server only translates providerOptions at message level

Our applyCacheControl() was setting providerOptions.anthropic.cacheControl at the content part level, which the gateway server ignores. Only createCachedSystemMessage() was setting it at message level (hence system prompt caching worked).

Fix

Update wrapFetchWithAnthropicCacheControl to:

  1. Detect gateway format by checking for json.prompt array
  2. Add providerOptions.anthropic.cacheControl at message level for gateway requests
  3. Keep cache_control injection at content part level for direct Anthropic

Testing

Verified caching works by checking that cache read tokens increase on subsequent messages.


Generated with mux

The gateway provider sends requests in SDK internal format (json.prompt)
rather than Anthropic API format (json.messages). Additionally, the
gateway server only translates providerOptions.anthropic.cacheControl
at the message level, not at content part level.

This fix:
1. Detects gateway format by checking for json.prompt array
2. Adds providerOptions.anthropic.cacheControl at message level for gateway
3. Keeps cache_control injection at content part level for direct Anthropic

Without this fix, only system messages were cached (5.4k) because they
already had message-level providerOptions. Conversation history was not
cached because applyCacheControl() sets providerOptions at content part
level, which the gateway server ignores.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +150 to 154
// For direct Anthropic: add cache_control to last content part
const content = lastMsg.content;
if (Array.isArray(content) && content.length > 0) {
// Array content: add cache_control to last part
const lastPart = content[content.length - 1] as Record<string, unknown>;
lastPart.cache_control ??= { type: "ephemeral" };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid adding cache_control to gateway prompts

Gateway Anthropic requests use the AI SDK json.prompt schema and the function notes the gateway rejects raw cache_control fields, yet this block still injects cache_control into the last prompt content part when messages resolves from json.prompt. That means gateway chat requests with array content will now carry Anthropic-specific fields the gateway schema doesn’t accept, leading to 400/validation errors instead of enabling caching for those calls.

Useful? React with 👍 / 👎.

Copy link
Member Author

@ethanndickson ethanndickson Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true, this function is only called for Anthropic

@ethanndickson
Copy link
Member Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ethanndickson ethanndickson added this pull request to the merge queue Dec 1, 2025
Merged via the queue into main with commit da9acc6 Dec 1, 2025
18 of 19 checks passed
@ethanndickson ethanndickson deleted the mux-gateway-caching-fix branch December 1, 2025 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants