Skip to content

fix: fix mem leak#1201

Merged
hiworldwzj merged 2 commits intomainfrom
fix_leak
Feb 4, 2026
Merged

fix: fix mem leak#1201
hiworldwzj merged 2 commits intomainfrom
fix_leak

Conversation

@blueswhen
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @blueswhen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on resolving memory leak issues within the LightLLM server by implementing a more robust and centralized approach to managing shared memory and pinned memory tensors. It ensures that resources are properly released when requests are completed, thereby improving the stability and efficiency of the system. Additionally, the changes include significant enhancements to the benchmark script, making it more resilient to transient network failures and improving its overall reliability for performance testing.

Highlights

  • Memory Leak Fixes: Implemented explicit shared memory detachment and closing mechanisms for request objects and their associated arrays (e.g., prompt IDs, log probabilities) across various components like the detokenization manager, HTTP server manager, and inference batch processing.
  • Centralized Pinned Memory Management: Introduced and utilized a g_pin_mem_manager to centralize and optimize the creation of pinned memory tensors, replacing direct torch.from_numpy().pin_memory().cuda() calls with a more managed approach.
  • Radix Cache Robustness: Ensured proper cloning of tensors (token_id_key, token_mem_index_value) when creating new nodes in the radix cache to prevent unintended shared references and potential data corruption. Also, refined reference counting for the root node.
  • Benchmark Script Improvements: Enhanced the benchmark utility with retry logic for network requests, improved error handling for stream disconnections, and added task timeouts in the response collector to prevent indefinite hangs and ensure more reliable testing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lightllm/common/req_manager.py
    • Imported g_pin_mem_manager for unified pinned memory management.
    • Updated init_req_sampling_params to use g_pin_mem_manager.gen_from_list for prompt_ids creation.
    • Refactored gen_cpu_out_token_counter_sampling_params to use g_pin_mem_manager.gen_from_list for various tensor allocations.
  • lightllm/server/core/objs/req.py
    • Added release_shm_arrays method to ShmReq to detach shared memory arrays (shm_prompt_ids, shm_logprobs) upon request completion.
  • lightllm/server/core/objs/shm_array.py
    • Introduced detach_shm method to ShmArray for closing shared memory connections without unlinking the underlying object.
  • lightllm/server/detokenization/manager.py
    • Integrated decode_req.req.release_shm_arrays() call into remove_finished_reqs for proper shared memory detachment.
  • lightllm/server/httpserver/manager.py
    • Added explicit calls to req.shm_prompt_ids.close_shm() and req.shm_logprobs.close_shm() during resource recycling.
  • lightllm/server/router/dynamic_prompt/radix_cache.py
    • Modified split_node and add_and_return_new_child to use .clone() for token_id_key and token_mem_index_value assignments.
    • Added a conditional check (if update_refs:) before decrementing the root node's reference counter in match_prefix.
  • lightllm/server/router/model_infer/infer_batch.py
    • Added a call to req.shm_req.release_shm_arrays() in the _filter method to detach shared memory for finished inference requests.
  • lightllm/server/router/model_infer/mode_backend/generic_post_process.py
    • Imported g_pin_mem_manager for consistent pinned memory management.
    • Updated sample and _get_post_sample_tensors to use g_pin_mem_manager.gen_from_list for tensor creation.
  • test/benchmark/service/benchmark_qps.py
    • Added _DEFAULT_RETRY constant and implemented retry logic for async_post_stream_openai and async_post_stream_lightllm.
    • Enhanced response_collector with task_timeout_s and improved error handling to prevent hangs and ensure reliable counter updates.
    • Removed a debug print statement and made a minor formatting change to run_continuous_benchmark.
Activity
  • The author blueswhen initiated this pull request to address memory leak issues.
  • The changes involve a comprehensive refactoring of memory management, particularly for shared memory and pinned memory tensors.
  • New methods for detaching and closing shared memory arrays have been introduced and integrated into the request lifecycle.
  • The benchmark script has been made more robust with added retry mechanisms and timeout handling for network operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily addresses memory leaks related to shared and pinned memory by introducing explicit resource management functions. It ensures shared memory is properly detached and unlinked, and centralizes pinned memory allocation. Additionally, it includes important correctness fixes in the radix cache logic and significantly improves the robustness of the benchmark script. The changes are well-implemented and address critical issues. I have a few suggestions to further improve the benchmark script's maintainability and retry logic.

Comment on lines +130 to +131
if response.status != 200:
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The retry loop is intended to handle transient errors, but returning [] here on any non-200 status code prevents retries for HTTP errors (e.g., 503 Service Unavailable). Consider checking the status code and only returning immediately for non-retriable errors (like 4xx client errors), while continuing the loop for retriable server errors (e.g., 5xx).

@hiworldwzj hiworldwzj merged commit d09bb92 into main Feb 4, 2026
1 check passed
@hiworldwzj hiworldwzj deleted the fix_leak branch February 4, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants