Skip to content

Implement a retry_strategy object for retrying operations on the agent #15

@zastrowm

Description

@zastrowm

Overview

Implement a configurable retry_strategy parameter for the Agent class to replace hardcoded retry logic with a flexible, hook-based retry system. This will allow users to configure retry behavior and implement custom retry strategies.

Current State

The SDK currently has hardcoded retry logic in event_loop.py:

  • MAX_ATTEMPTS = 6
  • INITIAL_DELAY = 4 seconds
  • MAX_DELAY = 240 seconds (4 minutes)
  • Exponential backoff for ModelThrottledException
  • Emits EventLoopThrottleEvent during retries

Related Issues and PRs


Implementation Requirements

1. Create Retry Strategy Classes

Location: src/strands/agent/retry.py (or similar appropriate location)

Create a base retry strategy and built-in implementations:

ModelRetryStrategy (Default)

  • Implements HookProvider protocol
  • Configurable parameters:
    • max_attempts (default: 6)
    • initial_delay (default: 4 seconds)
    • max_delay (default: 240 seconds)
  • Implements exponential backoff for ModelThrottledException
  • Registers callback for AfterModelCallEvent
  • Sets event.retry = True on throttling exceptions (respecting max_attempts)
  • Includes logging for retry attempts (using SDK logging standards)
  • Supports async sleep during backoff delays
  • Must emit EventLoopThrottleEvent for backwards compatibility (this might be hardcoded to the agent loop if needed since hooks cannot emit events)

Naming: Choose a better name than "ModelRetrys" - something that represents model retries but isn't throttling-specific (e.g., ModelRetryStrategy, RetryStrategy, etc.)

NoopRetryStrategy

  • Implements HookProvider protocol
  • No-op implementation for users who want to explicitly disable retries
  • register_hooks() does nothing

2. Update Agent Class

Location: src/strands/agent/agent.py

Add retry_strategy parameter to Agent.__init__():

def __init__(
    self,
    # ... existing parameters ...
    retry_strategy: Optional[HookProvider] = None,
    # ... other parameters ...
):

Behavior:

  • If retry_strategy is None: Default to ModelRetryStrategy() with current defaults (6 attempts, 4s initial, 240s max)
  • Store as read-only property: self._retry_strategy
  • Register retry_strategy as a hook like any other HookProvider
  • Type hint as HookProvider (or create more specific RetryStrategy protocol if needed)

Integration:

  • May need to access retry_strategy from event loop for backwards compatibility (emitting EventLoopThrottleEvent)
  • Works alongside other hooks - retry_strategy is just another registered hook

3. Refactor Event Loop

Location: src/strands/event_loop/event_loop.py

Remove:

  • MAX_ATTEMPTS = 6
  • INITIAL_DELAY = 4
  • MAX_DELAY = 240
  • Hardcoded throttling retry logic in _handle_model_execution()

Refactor:

  • Move retry logic from event loop to ModelRetryStrategy hook
  • Keep the retry loop structure but rely on hooks setting AfterModelCallEvent.retry
  • Ensure EventLoopThrottleEvent is still emitted (may need special handling for built-in ModelRetryStrategy)
  • The event loop should be simpler - just invoke hooks and respect the retry field

4. Backwards Compatibility

Critical Requirement: Existing code relying on EventLoopThrottleEvent must continue to work.

Approach:

  • ModelRetryStrategy must emit EventLoopThrottleEvent during retries
  • May need to check if retry_strategy is the built-in ModelRetryStrategy for special event handling
  • Default behavior (when retry_strategy=None) must be identical to current behavior

5. Testing

Location: tests/strands/agent/ and tests/strands/event_loop/

Required Test Scenarios:

  1. Default behavior: Verify that not specifying retry_strategy uses default ModelRetryStrategy with 6 attempts
  2. Custom retry strategy: Test a user-implemented custom retry strategy
  3. Backwards compatibility: Verify that EventLoopThrottleEvent is emitted as before
  4. NoopRetryStrategy: Test that retries can be disabled
  5. Configured parameters: Test ModelRetryStrategy with custom max_attempts, initial_delay, max_delay
  6. Integration with other hooks: Verify retry_strategy works alongside other hooks (no special interaction tests needed, just basic compatibility)

Files to Modify

  1. src/strands/hooks/retry.py (new file)

    • Create ModelRetryStrategy class
    • Create NoopRetryStrategy class
    • Implement HookProvider protocol
    • Handle retry logic and event emission
  2. src/strands/agent/agent.py

    • Add retry_strategy parameter to __init__()
    • Add _retry_strategy read-only property
    • Register retry_strategy as hook
  3. src/strands/event_loop/event_loop.py

    • Remove hardcoded constants
    • Refactor _handle_model_execution() to rely on hooks
    • Simplify retry loop logic
  4. tests/strands/hooks/test_retry.py (new file)

    • Test ModelRetryStrategy with default and custom parameters
    • Test NoopRetryStrategy
    • Test custom retry strategy implementation
  5. tests/strands/agent/test_agent_retry_strategy.py (new file or add to existing)

    • Test Agent initialization with retry_strategy
    • Test backwards compatibility
    • Test EventLoopThrottleEvent emission
  6. tests/strands/event_loop/test_event_loop_retry.py (update existing)

    • Update existing retry tests to work with new system
    • Test backwards compatibility
  7. Documentation (location TBD)

    • User guide for retry_strategy feature
    • Examples of custom retry strategies

Acceptance Criteria

  • ModelRetryStrategy class implements HookProvider and handles throttling retries
  • NoopRetryStrategy class implements HookProvider with no-op behavior
  • Agent accepts retry_strategy parameter (defaults to ModelRetryStrategy)
  • Hardcoded retry constants removed from event_loop.py
  • Event loop refactored to rely on hook-based retries
  • EventLoopThrottleEvent still emitted for backwards compatibility
  • Tests pass for default behavior with same retry characteristics as before
  • Tests pass for custom retry strategy implementation
  • Tests verify backwards compatibility (EventLoopThrottleEvent emission)
  • Tests pass for NoopRetryStrategy
  • Tests pass for configured retry parameters
  • Documentation created for retry_strategy feature
  • All existing tests continue to pass
  • Code follows SDK patterns (logging, type hints, docstrings)
  • Pre-commit hooks pass (formatting, linting, type checking)

Technical Approach

Implementation Strategy

  1. Create retry strategy classes with HookProvider protocol
  2. Integrate into Agent by adding retry_strategy parameter and registering as hook
  3. Refactor event loop to remove hardcoded logic and rely on hooks
  4. Ensure backwards compatibility by emitting EventLoopThrottleEvent
  5. Write comprehensive tests covering all scenarios
  6. Document the feature with examples

Key Design Decisions

  • Hook-based approach: Retry strategies are HookProviders, registered like any other hook
  • Read-only property: Store as _retry_strategy for potential backwards compat access
  • Default behavior preserved: None defaults to ModelRetryStrategy with current settings
  • Explicit disable: Use NoopRetryStrategy instead of None to disable retries
  • Backwards compatible: EventLoopThrottleEvent emission preserved

Integration Points

  • Retry strategies integrate via the existing hook system
  • No special handling needed for interaction with other hooks
  • Event loop continues to respect AfterModelCallEvent.retry field
  • ModelRetryStrategy sets retry field based on its configuration

Notes

  • The name "ModelRetrys" should be improved to something more generic that represents model retries without being throttling-specific
  • This feature enables users to implement sophisticated retry logic beyond throttling (rate limiting, circuit breakers, custom backoff strategies, etc.)
  • The hook-based approach maintains consistency with SDK patterns and provides maximum flexibility

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions