-
Notifications
You must be signed in to change notification settings - Fork 13
[AIENG-73] Disable retry when ReadTimeout occurs #992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request updates the HTTP client’s retry strategy and configuration. In the HTTP client module, the import statement now includes the new environment key, and a constant Changes
Sequence Diagram(s)sequenceDiagram
participant C as Caller
participant H as _HttpClient
participant E as Environment
C->>H: Instantiate _HttpClient(base_url, session, test_runner, app)
H->>E: Check for SKIP_TIMEOUT_RETRY variable
alt SKIP_TIMEOUT_RETRY is set
H->>H: Set read timeout to 0
else
H->>H: Set read timeout to MAX_RETRIES (3)
end
H->>C: Return configured _HttpClient instance
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (14)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Konboi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are cases where a retry should be performed even after a timeout, especially for the subset requests.
Since loading the model takes time, the first request may time out.
However, in many cases, the model loading is completed by the second third request, so a response can be returned without timing out.
Don’t you think that the current modification would prevent us from handling such (subset) cases?
|
Oh, I didn't know that. Thank you for your information. Therefore, I'll change the code so that requests are retried when the CLI requests test subsets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
launchable/utils/http_client.py (1)
131-131:⚠️ Potential issueVerify handling of session parameter in request method
Line 131 uses
self.sessionto make the request, but the method now accepts a customsessionparameter. You should use thesessionparameter that was set up earlier in the method instead ofself.session.- response = self.session.request(method, url, headers=headers, timeout=timeout, data=data, - params=params, verify=(not self.skip_cert_verification)) + response = session.request(method, url, headers=headers, timeout=timeout, data=data, + params=params, verify=(not self.skip_cert_verification))
🧹 Nitpick comments (2)
launchable/utils/http_client.py (2)
94-100: Consider potential duplication in retry configurationThere's some duplication between the retry configuration in
__init__and in therequestmethod. While they serve different purposes, it might be worth refactoring to have a single function that generates the appropriate retry strategy based on the context.- strategy = Retry( - total=MAX_RETRIES, - read=read, - allowed_methods=["GET", "PUT", "PATCH", "DELETE"], - status_forcelist=[429, 500, 502, 503, 504], - backoff_factor=2 - ) + strategy = self._create_retry_strategy(read) # Then add this method to the class + def _create_retry_strategy(self, read=0): + return Retry( + total=MAX_RETRIES, + read=read, + allowed_methods=["GET", "PUT", "PATCH", "DELETE"], + status_forcelist=[429, 500, 502, 503, 504], + backoff_factor=2 + )
85-107: Consider resource management for created sessionsWhen creating a new session object in the
requestmethod, there's no explicit cleanup of these sessions. While Python's garbage collection will eventually clean them up, it would be better to ensure that sessions are closed properly, especially if this method is called frequently.You could handle this by using a context manager approach:
if session is None: # Create the retry strategy and adapter as before s = Session() s.mount("http://", adapter) s.mount("https://", adapter) session = s # Mark that we need to close this session should_close_session = True else: should_close_session = False try: # Use the session as before response = session.request(...) return response finally: if should_close_session: session.close()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
launchable/utils/http_client.py(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: build (windows-latest, 3.12)
- GitHub Check: build (windows-latest, 3.11)
- GitHub Check: build (windows-latest, 3.10)
- GitHub Check: build (windows-latest, 3.9)
- GitHub Check: build (windows-latest, 3.8)
- GitHub Check: build (windows-latest, 3.7)
- GitHub Check: build (windows-latest, 3.6)
- GitHub Check: build (ubuntu-20.04, 3.12)
- GitHub Check: build (ubuntu-20.04, 3.11)
- GitHub Check: build (ubuntu-20.04, 3.10)
- GitHub Check: build (ubuntu-20.04, 3.9)
- GitHub Check: build (ubuntu-20.04, 3.8)
- GitHub Check: build (ubuntu-20.04, 3.7)
- GitHub Check: build (ubuntu-20.04, 3.6)
🔇 Additional comments (6)
launchable/utils/http_client.py (6)
11-11: Good update to import directly from urllib3The import path for
Retryhas been updated to import directly fromurllib3.util.retryinstead of throughrequests.packages. This is a cleaner approach as it leverages the direct dependency.
27-28: Good refactoring by introducing a constant for retry configurationIntroducing the
MAX_RETRIESconstant improves maintainability by centralizing the retry configuration value.
56-59: Implementation aligns with PR objective to disable read timeout retriesSetting
read=0in the retry strategy effectively disables retries when ReadTimeout occurs, which matches the PR objective to prevent execution delays when the Launchable server is unstable.
83-83: Good enhancement to the request method signatureAdding an optional session parameter provides more flexibility by allowing callers to provide their own session with custom retry configurations.
85-107: Well-implemented session handling with specialized retry strategyThe implementation correctly:
- Creates a new session with custom retry strategy when session is not provided
- Disables read timeout retries by default (read=0)
- Makes a special exception for the "subset" path to allow retries
This approach aligns perfectly with the PR objective while maintaining functionality for subset requests that require retries.
89-94:Details
✅ Verification successful
Good implementation of subset path exception
The special handling for the "subset" path is well implemented and documented. This ensures that retries are still enabled for subset requests where they are needed.
🏁 Script executed:
#!/bin/bash # Let's confirm if the subset path is used in the codebase to validate this special case # Search for direct references to the subset endpoint echo "Searching for direct references to subset endpoint..." rg -l "path\s*=\s*['\"]subset['\"]" --type py # Search for more general usage of the subset keyword echo "Searching for general subset keyword usage..." rg "\bsubset\b" --type pyLength of output: 33344
Subset Request Exception Handling Approved
The implementation in
launchable/utils/http_client.pyclearly distinguishes the "subset" path by settingread = MAX_RETRIESwith an appropriate comment explaining why extra retries are warranted. The code and its documentation have been verified against the codebase usage, and no issues were found.
ca14ea7 to
7ddb7e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
launchable/commands/subset.py (1)
527-530: Consider revisiting the TODO comment about extended timeout.This TODO comment indicates a temporary workaround for slow API responses. While unrelated to the current PR's focus on retry behavior, it would be good to track this as a separate issue if it hasn't been addressed yet.
Consider adding a ticket/issue reference to this TODO comment for better traceability:
- # temporarily extend the timeout because subset API response has become slow - # TODO: remove this line when API response return respose - # within 300 sec + # temporarily extend the timeout because subset API response has become slow + # TODO: [AIENG-XX] Remove this extended timeout when API responses return within 300 sec
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
launchable/commands/subset.py(2 hunks)launchable/utils/http_client.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- launchable/utils/http_client.py
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: build (windows-latest, 3.12)
- GitHub Check: build (windows-latest, 3.11)
- GitHub Check: build (windows-latest, 3.10)
- GitHub Check: build (windows-latest, 3.9)
- GitHub Check: build (windows-latest, 3.8)
- GitHub Check: build (windows-latest, 3.7)
- GitHub Check: build (windows-latest, 3.6)
- GitHub Check: build (ubuntu-20.04, 3.12)
- GitHub Check: build (ubuntu-20.04, 3.11)
- GitHub Check: build (ubuntu-20.04, 3.10)
- GitHub Check: build (ubuntu-20.04, 3.9)
- GitHub Check: build (ubuntu-20.04, 3.8)
- GitHub Check: build (ubuntu-20.04, 3.7)
- GitHub Check: build (ubuntu-20.04, 3.6)
🔇 Additional comments (4)
launchable/commands/subset.py (4)
11-12: Good addition of necessary imports for HTTP session management.These imports provide the required classes for configuring custom HTTP retry logic, which aligns with the PR objective to manage retry behavior.
16-16: LGTM: Import of retry utilities from the centralized HTTP client.The import of
MAX_RETRIESanddefault_retry_strategyfrom the HTTP client module enables consistent retry behavior configuration across the application.
523-523: Good: Properly passing the custom session to the LaunchableClient.The custom session with the configured retry strategy is correctly passed to the LaunchableClient, ensuring that the client uses the defined retry behavior for all requests.
512-520:Details
❓ Verification inconclusive
Appropriate implementation of custom retry strategy for subset requests.
This implementation correctly creates a custom HTTP session with a retry strategy specifically for subset requests. The strategy allows retries for read operations, which is an intentional exception to the PR's general objective of disabling retries for
ReadTimeouterrors. This is appropriate since, as the comments note, subset requests may require retries due to model loading time on the server side.I do have a question about the implementation details of
default_retry_strategy():
🏁 Script executed:
#!/bin/bash # Check the implementation of default_retry_strategy to understand how it handles ReadTimeout grep -rn "def default_retry_strategy" --include="*.py" . cat $(grep -l "def default_retry_strategy" --include="*.py" . | head -1)Length of output: 225
Attention: Validate
default_retry_strategyImplementation DetailsThe custom HTTP session and retry configuration for subset requests is implemented appropriately and intentionally allows retries on read operations. However, the script to verify the implementation of
default_retry_strategy()(located inlaunchable/utils/http_client.py) did not yield its code snippet due to a directory scanning error. Please ensure that:
- The implementation of
default_retry_strategy()properly configures retries for read operations (especially forReadTimeoutconditions) as intended for subset requests.- There are no unintended side effects on the general error-handling behavior elsewhere in the codebase.
It is recommended to run a revised check (for example, using a corrected script) or perform a manual review of the function’s details to confirm its behavior.
launchable/commands/subset.py
Outdated
| try: | ||
| test_runner = context.invoked_subcommand | ||
|
|
||
| strategy = default_retry_strategy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked split-subset logic in Launchable server, but it doesn't load the model in that API. Hence, I believe it's enough to retry requests in only subset API.
launchable/commands/subset.py
Outdated
| try: | ||
| test_runner = context.invoked_subcommand | ||
|
|
||
| strategy = default_retry_strategy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that the HTTP client logic is getting too involved here… even this is subset.py
Can we overwrite it before the request phase using path?
launchable/utils/http_client.py
Outdated
| return os.getenv(BASE_URL_KEY) or DEFAULT_BASE_URL | ||
|
|
||
|
|
||
| def default_retry_strategy(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m wondering if it’s really okay for the default behavior to not retry on read timeouts.
Wouldn’t it be better to enable retries by default, and provide an environment variable to skip retries specifically for read timeouts if needed?
e.g.)
SKIP_TIMEOUT_RETRY=true launchable xxxWhat do you think?
|
|
I modified code so that we can disable retrying requests by configuring a environment variable 👍 |
…12877) When a Launchable server is unstable, a ReadTimeoutException occurs in the Launchable CLI. In such case, the Launchable CLI retries requests, which slows down CI execution. In this PR, I configured the environment variable SKIP_TIMEOUT_RETRY to disable retry attempts on requests(Link: cloudbees-oss/smart-tests-cli#992). ``` WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='api.mercury.launchableinc.com', port=443): Read timed out. (read timeout=15)")': /intake/organizations/ruby/workspaces/ruby/commits/collect/options WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='api.mercury.launchableinc.com', port=443): Read timed out. (read timeout=15)")': /intake/organizations/ruby/workspaces/ruby/commits/collect/options WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='api.mercury.launchableinc.com', port=443): Read timed out. (read timeout=15)")': /intake/organizations/ruby/workspaces/ruby/commits/collect/options HTTPSConnectionPool(host='api.mercury.launchableinc.com', port=443): Max retries exceeded with url: /intake/organizations/ruby/workspaces/ruby/commits/collect/options (Caused by ReadTimeoutError("HTTPSConnectionPool(host='api.mercury.launchableinc.com', port=443): Read timed out. (read timeout=15)")) Exception in thread "main" javax.net.ssl.SSLException: Connection reset at sun.security.ssl.Alert.createSSLException(Alert.java:127) at sun.security.ssl.TransportContext.fatal(TransportContext.java:331) at sun.security.ssl.TransportContext.fatal(TransportContext.java:274) at sun.security.ssl.TransportContext.fatal(TransportContext.java:269) at sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1572) at sun.security.ssl.SSLSocketImpl.access$400(SSLSocketImpl.java:73) at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:982) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) at com.launchableinc.ingest.commits.CommitGraphCollector.transfer(CommitGraphCollector.java:131) at com.launchableinc.ingest.commits.CommitIngester.run(CommitIngester.java:145) at com.launchableinc.ingest.commits.CommitIngester.main(CommitIngester.java:72) Suppressed: java.net.SocketException: Broken pipe (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:155) at sun.security.ssl.SSLSocketOutputRecord.encodeAlert(SSLSocketOutputRecord.java:81) at sun.security.ssl.TransportContext.fatal(TransportContext.java:362) ... 25 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464) at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68) at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1350) at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73) at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:966) ... 20 more Couldn't get commit history from `/github/workspace/src`. Do you run command root of git-controlled directory? If not, please set a directory use by --source option. HTTPSConnectionPool(host='api.mercury.launchableinc.com', port=443): Read timed out. (read timeout=60) Command '['java', '-jar', '/root/.local/pipx/venvs/launchable/lib/python3.10/site-packages/launchable/jar/exe_deploy.jar', 'ingest:commit', '-endpoint', ``` https://github.com/ruby/ruby/actions/runs/13572112090/job/37939529243



When Launchable server is unstable,
ReadTimeoutexception can occur. To prevent the execution from slowing down, we disable retrying the request in this case.Before
After
Summary by CodeRabbit