Skip to content

Commit 6782e8f

Browse files
authored
fix: Don't retry for non-recoverable server http errors (#212)
# What does this PR do? This is specifically addressing the issue where server returning Not Implemented (code 501) would receive two more attempts for the same request, even though there's no reason to expect it to serve the request any better on further attempts. This patch reduces the number of >=500 codes that would be restarted to those where there seems to be a chance of recover on further attempts. These codes are now explicitly listed instead of broad >=500 filter. For all possible server codes, please consult e.g. here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status#server_error_responses ## Test Plan Client no longer retries requests when server fails. Injected a fake `raise NotImplementedError` in one of providers. Then triggered the API. Confirmed that the server no longer logs three times the following, but just once: ``` "POST /v1/datasets HTTP/1.1" 501 Not Implemented ``` Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
1 parent 0d4dc64 commit 6782e8f

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

src/llama_stack_client/_base_client.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -773,7 +773,11 @@ def _should_retry(self, response: httpx.Response) -> bool:
773773
return True
774774

775775
# Retry internal errors.
776-
if response.status_code >= 500:
776+
if response.status_code in (
777+
502, # Bad Gateway
778+
503, # Service Unavailable
779+
504, # Gateway Timeout
780+
):
777781
log.debug("Retrying due to status code %i", response.status_code)
778782
return True
779783

0 commit comments

Comments
 (0)