feat(client): ✨ implement cross-provider model fallback system#91
feat(client): ✨ implement cross-provider model fallback system#91
Conversation
This introduces a robust mechanism to define and utilize groups of equivalent models across different providers. If a request is made to a model that belongs to a fallback group, the client can now seamlessly rotate through other providers/models in that group upon exhaustion of credentials or errors. - Add `FallbackGroupManager` to parse and index fallback configurations from init args or the `MODEL_FALLBACK_GROUPS` environment variable. - Implement `_execute_with_fallback` in `RotatingClient` for standard completions, handling credential swapping and tier prioritization across providers. - Implement `_streaming_with_fallback` to support transparent failover during streaming responses. - Integrate fallback logic into the main `acompletion` entry point.
|
I'm diving into the new cross-provider model fallback system. Looking forward to reviewing the and how it integrates into the . Back soon with the full report! |
src/rotator_library/client.py
Outdated
|
|
||
| # Temporarily swap credentials for this provider | ||
| original_creds = self.all_credentials.get(provider, []) | ||
| self.all_credentials[provider] = tier_creds |
There was a problem hiding this comment.
CRITICAL: This is a major concurrency race condition. self.all_credentials is shared across all requests. Swapping it globally will affect every other concurrent request to this provider, causing them to use only the credentials from this fallback tier.
To fix this properly, _execute_with_retry should be updated to accept an optional credentials_override list, avoiding global state modification during request execution.
src/rotator_library/client.py
Outdated
|
|
||
| # Temporarily swap credentials for this provider | ||
| original_creds = self.all_credentials.get(provider, []) | ||
| self.all_credentials[provider] = tier_creds |
There was a problem hiding this comment.
CRITICAL: Same race condition as in _execute_with_fallback. Global modification of self.all_credentials is not safe for concurrent streaming requests.
| group = self._groups[group_idx] | ||
|
|
||
| # Move target to front, preserve relative order of others | ||
| reordered = [provider_model] |
There was a problem hiding this comment.
Consider deduplicating the group members here. If an entry appears multiple times in the group config, it will currently appear multiple times in the returned list (except for the promoted target).
| reordered = [provider_model] | |
| # Move target to front, preserve relative order of others, and deduplicate | |
| reordered = [provider_model] | |
| seen = {provider_model} | |
| for entry in group: | |
| if entry not in seen: | |
| reordered.append(entry) | |
| seen.add(entry) |
There was a problem hiding this comment.
Overall Assessment
This PR introduces a powerful cross-provider model fallback system, which is a great addition for increasing the reliability of the client. The separation of the fallback logic into FallbackGroupManager is clean and well-structured.
However, I've identified a critical concurrency issue in how credentials are swapped during the fallback process (see my previous inline comments on lines 2062 and 2221 in client.py). Since self.all_credentials is a shared instance variable, modifying it during a request will interfere with any other concurrent requests being handled by the same RotatingClient instance. This needs to be addressed before the PR can be merged.
Architectural Feedback
- The tier-aware fallback logic correctly respects provider priorities while allowing cross-provider rotation.
- Critical Concern: The reliance on globally swapping
self.all_credentialsmust be refactored to use local state/arguments for the retry logic to ensure thread/async safety.
Key Suggestions
- Refactor
_execute_with_retryand_streaming_acompletion_with_retryto accept an explicitcredentialslist instead of reading fromself.all_credentials. - Improve the robustness of
FallbackGroupManagerby deduplicating and normalizing config entries (see suggestion on line 137). - Consider if
aembeddingshould also benefit from these fallback groups for consistency. - In
_streaming_with_fallback, consider if we should try other fallback entries if the first one returns only a[DONE]signal without any actual data.
Questions for the Author
- Was the exclusion of
aembeddingfrom the fallback system intentional? - For streaming, do you think we should try other fallback entries if the first one returns only a
[DONE]signal without any actual data?
This review was generated by an AI assistant.
- Update `.env.example` with `MODEL_FALLBACK_GROUPS` configuration template and detailed usage comments. - Add comprehensive section to `DOCUMENTATION.md` explaining the fallback algorithm, tier-first ordering, and target promotion logic. - Update project and library `README.md` files to highlight the new feature and provide updated `RotatingClient` instantiation examples.
Updates retry mechanisms to accept an optional `credentials_for_provider` argument. This eliminates the need to temporarily mutate `self.all_credentials` during fallback attempts, improving thread safety and code clarity. Also in this commit: - fix(core): deduplicate entries when generating fallback groups
Refactor the cross-provider fallback system to use a sequential provider approach instead of global tier-based ordering. - Iterate through fallback entries sequentially, allowing each provider to complete its internal rotation (Tier 2 then Tier 1) before moving to the next. - Remove `credentials_for_provider` injection logic, simplifying retry methods and ensuring provider-specific settings are respected. - Update documentation and examples to reflect the new deterministic behavior. - Adjust logging levels for better clarity during fallback operations.
This introduces a robust mechanism to define and utilize groups of equivalent models across different providers. If a request is made to a model that belongs to a fallback group, the client can now seamlessly rotate through other providers/models in that group upon exhaustion of credentials or errors.
FallbackGroupManagerto parse and index fallback configurations from init args or theMODEL_FALLBACK_GROUPSenvironment variable._execute_with_fallbackinRotatingClientfor standard completions, handling credential swapping and tier prioritization across providers._streaming_with_fallbackto support transparent failover during streaming responses.acompletionentry point.Related #85
Important
Introduces a cross-provider model fallback system in
RotatingClient, enabling seamless credential rotation across providers for equivalent models.RotatingClient.FallbackGroupManagerto manage fallback configurations from init args orMODEL_FALLBACK_GROUPSenv variable._execute_with_fallbackand_streaming_with_fallbackfor handling fallback logic inclient.py.model_fallback_groupsparameter orMODEL_FALLBACK_GROUPSenv variable..env.examplewith fallback group configuration examples.DOCUMENTATION.mdandREADME.mdto include details on cross-provider fallback groups.src/rotator_library/README.md.client.pyandfallback_groups.py.This description was created by
for 96c4bdb. You can customize this summary. It will automatically update as commits are pushed.