Skip to content

Conversation

@zastrowm
Copy link
Member

@zastrowm zastrowm commented Jan 5, 2026

Description

The current retry logic for handling ModelThrottledException is hardcoded in event_loop.py with fixed values (6 attempts, exponential backoff starting at 4s). This makes it impossible for users to customize retry behavior for their specific use cases, such as:

  • Different rate limits for different models or API endpoints
  • Custom backoff strategies (linear, jittered exponential, etc.)
  • Disabling retries entirely for certain scenarios
  • Implementing custom retry conditions beyond just throttling

This PR refactors the hardcoded retry logic into a flexible, hook-based system that allows users to configure retry behavior via a new retry_strategy parameter on the Agent.

Public API Changes

Added a new retry_strategy parameter to Agent.__init__():

from strands import ModelRetryStrategy

agent = Agent(
    model="anthropic.claude-3-sonnet",
    retry_strategy=ModelRetryStrategy(
        max_attempts=3,
        initial_delay=2,
        max_delay=60
    )
)
# Retries up to 2 times with 2s-60s exponential backoff

And we have the NoopRetryStrategy which disables retries:

from strands import NoopRetryStrategy

agent = Agent(
    model="anthropic.claude-3-sonnet",
    retry_strategy=NoopRetryStrategy()
)
# Raises ModelThrottledException immediately without retries

The retry_strategy parameter accepts any HookProvider that implements retry logic via the AfterModelCallEvent hook. Two built-in strategies are provided:

  • ModelRetryStrategy — Exponential backoff retry with configurable parameters (default)
  • NoopRetryStrategy — Disables retries entirely

Custom retry strategies can be implemented by creating a hook provider that sets event.retry = True on the AfterModelCallEvent when a retry should occur.

Backwards Compatibility

The general default behavior is unchanged — agents still retry up to 5 times (6 attempts in total) with the same exponential backoff. The EventLoopThrottleEventand ForceStopEvent are still emitted during retries, maintaining backwards compatibility with existing hooks that listen for this event.

The exact delay times have changed!. Because of a bug in the original logic, the initial delay was actually doubled the first time it executed (see test_agent_events.py for the test changes to accomidate this). Previous to these changes, the delay(s) were:

8s, 16s, 32s, 64s, 128s

Afer these changes, the delays are:

4s, 8s, 16s, 32s, 64s

I think this are okay changes to make, however.

Implements Decisions

  • We preserve backwards comptability to emit EventLoopThrottleEventand ForceStopEvent events as we used to.

    • Because backwards compatability.
  • We do not emit EventLoopThrottleEvent for other retry-strategies

    • because we don't have a good way to determine if the user delayed;
    • In the future, a possible way to support this would be to allow hooks to emit events
  • We do emit ForceStopEvent whenever an exception bubbles out of the model invocation

    • This seems to be the convention
  • Hooks are the way that you implement a retry strategy:

    • Reasoning - general purpose way to allow retries in the future other places to retry later (like tools)
  • Naming

    • We name it retry_strategy so that as hooks are expanded to allow retrying tools, we can also enable tool retry strategies
    • We name it ModelRetryStrategy since it's only focused on model retries - in the future we might vend other strategies, but we can add a new strategy rather than attempting to fit it all into this one.
    • We name it NoopRetryStrategy because it's the easy way to turn it off

Related Issues

Documentation PR

Type of Change

Bug fix
New feature
Breaking change
Documentation update
Other (please describe):

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

strands-agent and others added 2 commits January 5, 2026 13:41
Refactored hardcoded retry logic in event_loop into a flexible,
hook-based retry system that allows users to customize retry behavior.
@codecov
Copy link

codecov bot commented Jan 5, 2026

Codecov Report

❌ Patch coverage is 95.31250% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/strands/agent/retry.py 94.11% 2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

zastrowm added a commit to zastrowm/docs that referenced this pull request Jan 8, 2026
…igh-Level constructs

In doing api bar raising for strands-agents/sdk-python/pull/1424, we determined that HookProvider is a too-low-level interface for exposing directly to integrators.  This captures that decision & reasoning in log format and sets us up to record future decisions in a similar way going forward.

See DECISIONS.md on the decision & the format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants