-
Notifications
You must be signed in to change notification settings - Fork 433
Fix memory leak in get_async_subtensor on initialization failure #3237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
Wrap the initialization step in a try-except block to ensure that if AsyncSubtensor. initialize() fails, the instance is properly closed before the exception propagates. This prevents resource leaks when get_async_subtensor is called repeatedly in a loop. Fixes opentensor#3235
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a memory leak in the get_async_subtensor factory function that occurs when the initialization step fails. Previously, if initialize() raised an exception, the partially-initialized AsyncSubtensor instance would be lost without proper cleanup, leaving resources like network connections and async tasks dangling.
Key Changes
- Added try-except block around
initialize()call inget_async_subtensor - Ensures
close()is called on the AsyncSubtensor instance before re-raising initialization exceptions - Prevents resource leaks when connection errors occur repeatedly in loops
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@basfroman Happy New Year, |
|
@Dairus01 I am not sure that noticed any errors in subtensor initialization. But each iteration memory increases 50 Mb approximately. |
|
@Dairus01 we ran the code on Ubuntu in Runpod. NAME="Ubuntu" |
|
@Dairus01 @basfroman how to reproduce: main.py main.config.js You can see on the screenshot below that app size increases each 10-15 seconds:
|

Memory Leak Fix in
get_async_subtensorCloses #3235
Problem Description
A memory leak was identified in the
bittensorlibrary when using theget_async_subtensorfactory function in a loop, specifically when the initialization step fails.When
get_async_subtensoris called, it performs two main steps:AsyncSubtensorobject..initialize()method on this object to establish a connection to the Substrate node.If the
.initialize()method raises an exception (e.g., due to a connection error, timeout, or invalid configuration), the exception propagates immediately to the caller. Crucially, the partially initializedAsyncSubtensorinstance—which may have already allocated resources such as anAsyncSubstrateInterfacewith pending tasks or open (but failing) connections—is lost.Because the exception interrupts the flow before the instance is returned, the caller never receives the
subtensorobject. Consequently, the caller cannot call.close()on it. This leaves the underlying resources (like theaiohttpsession or background tasks withinasync_substrate_interface) hanging, leading to a memory leak that becomes significant if the operation is retried in a loop.Thought Process
Analysis of the Issue:
get_async_subtensoris called repeatedly.get_async_subtensorinbittensor/core/async_subtensor.py.sub.initialize()fails, thesubvariable goes out of scope, but ifinitializestarted async tasks that referenceself, the object might be kept alive by the event loop or other references, preventing garbage collection. Explicit cleanup via.close()is required forAsyncSubtensor.Reproduction:
AsyncSubstrateInterfaceto simulate initialization failures.initializeraises an exception, theclosemethod of the underlying interface was not called.Verification of Synchronous Implementation:
bittensor/core/subtensor.py. The synchronousSubtensorclass is typically instantiated directly (Subtensor(...)). Its initialization happens in__init__, and if that fails, the object is generally not created or fully formed. More importantly, there isn't a factory function performing a two-step "create then async init" process that hides the instance on failure. Thus, the issue appeared specific to the async factory pattern.Solution
The solution is to wrap the initialization step in a
try...exceptblock within theget_async_subtensorfunction. This ensures that if initialization fails, we explicitly close theAsyncSubtensorinstance before allowing the exception to propagate.Updated Code:
By adding this error handling, we guarantee that any
AsyncSubtensorcreated by this factory is either successfully initialized and returned, or properly closed and discarded. This prevents the resource leak even when connection errors occur repeatedly.Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=201190161