Skip to content

Conversation

@codegen-sh
Copy link
Contributor

@codegen-sh codegen-sh bot commented Mar 19, 2025

Description

This PR implements Anthropic's prompt caching feature for Claude 3.7 Sonnet in the CodeAgent class. Prompt caching allows reusing large portions of prompts across multiple API calls, reducing costs by up to 90% for cached content and improving latency by up to 85% for long prompts.

Changes

  1. Added enable_prompt_caching parameter to the LLM class with a default value of False
  2. Added support for the anthropic-beta: prompt-caching-2024-07-31 header when prompt caching is enabled
  3. Added validation to ensure prompt caching is only enabled for supported models (Claude 3.5 Sonnet and Claude 3.0 Haiku)
  4. Added enable_prompt_caching parameter to the CodeAgent class with a default value of True for Claude models

Benefits

  • Reduced costs: Cached prompts can reduce input token costs by up to 90%
  • Improved latency: Response times can be cut by up to 85% for long prompts
  • Enhanced performance: Allows for inclusion of more context and examples without performance penalties

Notes

  • Prompt caching is currently in beta and only supported on Claude 3.5 Sonnet and Claude 3.0 Haiku
  • The cache has a 5-minute lifetime, refreshed each time the cached content is used
  • This implementation enables the feature but doesn't yet include the cache_control parameter for marking specific content as cacheable - that would require changes to the prompt structure and can be implemented in a future PR if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants