Skip to content

Conversation

@Dairus01
Copy link

@Dairus01 Dairus01 commented Dec 31, 2025

Memory Leak Fix in get_async_subtensor

Closes #3235

Problem Description

A memory leak was identified in the bittensor library when using the get_async_subtensor factory function in a loop, specifically when the initialization step fails.

When get_async_subtensor is called, it performs two main steps:

  1. Instantiates a new AsyncSubtensor object.
  2. Awaits the .initialize() method on this object to establish a connection to the Substrate node.

If the .initialize() method raises an exception (e.g., due to a connection error, timeout, or invalid configuration), the exception propagates immediately to the caller. Crucially, the partially initialized AsyncSubtensor instance—which may have already allocated resources such as an AsyncSubstrateInterface with pending tasks or open (but failing) connections—is lost.

Because the exception interrupts the flow before the instance is returned, the caller never receives the subtensor object. Consequently, the caller cannot call .close() on it. This leaves the underlying resources (like the aiohttp session or background tasks within async_substrate_interface) hanging, leading to a memory leak that becomes significant if the operation is retried in a loop.

Thought Process

  1. Analysis of the Issue:

    • The issue report highlighted a leak when get_async_subtensor is called repeatedly.
    • I examined the code for get_async_subtensor in bittensor/core/async_subtensor.py.
    • Code before fix:
      sub = AsyncSubtensor(network=network, ...)
      await sub.initialize()  # <--- If this fails, 'sub' is lost
      return sub
    • If sub.initialize() fails, the sub variable goes out of scope, but if initialize started async tasks that reference self, the object might be kept alive by the event loop or other references, preventing garbage collection. Explicit cleanup via .close() is required for AsyncSubtensor.
  2. Reproduction:

    • I created a reproduction script that mocked AsyncSubstrateInterface to simulate initialization failures.
    • I observed that when initialize raises an exception, the close method of the underlying interface was not called.
    • I confirmed that in a loop, this leads to an accumulation of unclosed objects.
  3. Verification of Synchronous Implementation:

    • I checked bittensor/core/subtensor.py. The synchronous Subtensor class is typically instantiated directly (Subtensor(...)). Its initialization happens in __init__, and if that fails, the object is generally not created or fully formed. More importantly, there isn't a factory function performing a two-step "create then async init" process that hides the instance on failure. Thus, the issue appeared specific to the async factory pattern.

Solution

The solution is to wrap the initialization step in a try...except block within the get_async_subtensor function. This ensures that if initialization fails, we explicitly close the AsyncSubtensor instance before allowing the exception to propagate.

Updated Code:

async def get_async_subtensor(
    network: Optional[str] = None,
    config: Optional["Config"] = None,
    mock: bool = False,
    log_verbose: bool = False,
) -> "AsyncSubtensor":
    # ... (instantiation)
    sub = AsyncSubtensor(
        network=network, config=config, mock=mock, log_verbose=log_verbose
    )
    
    # Wrap initialization to ensure cleanup on failure
    try:
        await sub.initialize()
        return sub
    except Exception as e: 
        # If initialization fails, close the instance to free resources
        await sub.close()
        raise e

By adding this error handling, we guarantee that any AsyncSubtensor created by this factory is either successfully initialized and returned, or properly closed and discarded. This prevents the resource leak even when connection errors occur repeatedly.

Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=201190161

Wrap the initialization step in a try-except block to ensure that
if AsyncSubtensor. initialize() fails, the instance is properly closed
before the exception propagates.  This prevents resource leaks when
get_async_subtensor is called repeatedly in a loop.

Fixes opentensor#3235
Copilot AI review requested due to automatic review settings December 31, 2025 18:54
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a memory leak in the get_async_subtensor factory function that occurs when the initialization step fails. Previously, if initialize() raised an exception, the partially-initialized AsyncSubtensor instance would be lost without proper cleanup, leaving resources like network connections and async tasks dangling.

Key Changes

  • Added try-except block around initialize() call in get_async_subtensor
  • Ensures close() is called on the AsyncSubtensor instance before re-raising initialization exceptions
  • Prevents resource leaks when connection errors occur repeatedly in loops

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Dairus01 Dairus01 changed the base branch from master to staging December 31, 2025 18:59
@Dairus01
Copy link
Author

Dairus01 commented Jan 1, 2026

@basfroman Happy New Year,
Please, I would love to hear your thoughts on this PR

@kudroma404
Copy link

@Dairus01 I am not sure that noticed any errors in subtensor initialization. But each iteration memory increases 50 Mb approximately.

@kudroma404
Copy link

@Dairus01 we ran the code on Ubuntu in Runpod.

NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

@kudroma404
Copy link

@Dairus01 @basfroman how to reproduce:

main.py

import bittensor as bt
import asyncio

async def main():
    while True:
        subtensor = await bt.get_async_subtensor("finney")
        await subtensor.close()
        await asyncio.sleep(10)
        print("Iteration")

if __name__ == "__main__":
    asyncio.run(main())

main.config.js

module.exports = {
  apps: [{
    name: 'bittensor_test',
    script: 'main.py',
    interpreter: '/usr/bin/python3.13',
  }]
};

You can see on the screenshot below that app size increases each 10-15 seconds:

{838BD804-8DF0-4B1E-8498-7406F61B1071}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak if subtensor created/removed in a loop

2 participants