feat(agent): add configurable retry_strategy for model calls #1424
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The current retry logic for handling ModelThrottledException is hardcoded in event_loop.py with fixed values (6 attempts, exponential backoff starting at 4s). This makes it impossible for users to customize retry behavior for their specific use cases, such as:
This PR refactors the hardcoded retry logic into a flexible, hook-based system that allows users to configure retry behavior via a new
retry_strategyparameter on the Agent.Public API Changes
Added a new
retry_strategyparameter toAgent.__init__():And we have the
NoopRetryStrategywhich disables retries:The
retry_strategyparameter accepts anyHookProviderthat implements retry logic via theAfterModelCallEventhook. Two built-in strategies are provided:ModelRetryStrategy— Exponential backoff retry with configurable parameters (default)NoopRetryStrategy— Disables retries entirelyCustom retry strategies can be implemented by creating a hook provider that sets
event.retry = Trueon theAfterModelCallEventwhen a retry should occur.Backwards Compatibility
The general default behavior is unchanged — agents still retry up to 5 times (6 attempts in total) with the same exponential backoff. The
EventLoopThrottleEventandForceStopEventare still emitted during retries, maintaining backwards compatibility with existing hooks that listen for this event.The exact delay times have changed!. Because of a bug in the original logic, the initial delay was actually doubled the first time it executed (see
test_agent_events.pyfor the test changes to accomidate this). Previous to these changes, the delay(s) were:Afer these changes, the delays are:
I think this are okay changes to make, however.
Implements Decisions
We preserve backwards comptability to emit
EventLoopThrottleEventandForceStopEventevents as we used to.We do not emit
EventLoopThrottleEventfor other retry-strategiesWe do emit
ForceStopEventwhenever an exception bubbles out of the model invocationHooks are the way that you implement a retry strategy:
Naming
retry_strategyso that as hooks are expanded to allow retrying tools, we can also enable tool retry strategiesModelRetryStrategysince it's only focused on model retries - in the future we might vend other strategies, but we can add a new strategy rather than attempting to fit it all into this one.NoopRetryStrategybecause it's the easy way to turn it offRelated Issues
Documentation PR
Type of Change
Bug fix
New feature
Breaking change
Documentation update
Other (please describe):
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.