-
Notifications
You must be signed in to change notification settings - Fork 257
minor fix: vLLM LoRA request cleanup for issue #751 #765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @TinLongYu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical "KeyError" that occurred during the cleanup of LoRA requests within the vLLM integration, specifically impacting LoRA-based reinforcement learning fine-tuning. The change introduces a robust check in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a KeyError crash during LoRA-based fine-tuning by adding a defensive check in the finish_request function. The change is a targeted fix to prevent removing a request ID that is no longer in the running_requests list. The approach of monkey-patching vllm's LoRARequestStates is a reasonable short-term solution as noted. My review includes a suggestion to further improve the robustness of the finish_request function by also handling cases where the LoRA name might not be present in lora_name_to_stats, preventing another potential KeyError.
| lora_stats = self.lora_name_to_stats[req_state.lora_name] | ||
| if req_state.request_id in lora_stats.running_requests: | ||
| lora_stats.running_requests.remove(req_state.request_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation at line 343 (self.lora_name_to_stats[req_state.lora_name]) could still raise a KeyError if req_state.lora_name is not present in the lora_name_to_stats dictionary. To make this function more robust and prevent another potential crash, it's better to use the .get() method for safer dictionary access. This also allows for a more concise check.
| lora_stats = self.lora_name_to_stats[req_state.lora_name] | |
| if req_state.request_id in lora_stats.running_requests: | |
| lora_stats.running_requests.remove(req_state.request_id) | |
| lora_stats = self.lora_name_to_stats.get(req_state.lora_name) | |
| if lora_stats and req_state.request_id in lora_stats.running_requests: | |
| lora_stats.running_requests.remove(req_state.request_id) |
| def finish_request(self, req_state: 'RequestState'): | ||
| if req_state.lora_name is None: | ||
| return | ||
| lora_stats = self.lora_name_to_stats[req_state.lora_name] | ||
| if req_state.request_id in lora_stats.running_requests: | ||
| lora_stats.running_requests.remove(req_state.request_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but could you add a comment about the change? We may want to note: (1) this is a simple copy from vllm v0.11.0 and we add a if-condition to check whether the request is running; (2) this patch can be removed once vllm is updated to v0.12.0 or higher.
A better solution may be applying the patch with the following condition:
from areal.utils import pkg_version
if not pkg_version.is_version_higher("vllm", "0.11.1"):
setattr(...) # apply the patch
FYI we wanted to update vLLM and SGLang, but they do not support the same pytorch version until the latest vllm 0.14.0.rc1. Hopefully we will update their versions when 0.14.0 is offically released.
Description
This PR fixes a crash during LoRA-based RL fine-tuning with vLLM caused by a
KeyErrorwhen cleaning up finished LoRA requests.Applies a simple defensive check before removing a request ID from
running_requeststhat only impacts the LoRA training pipeline. Updated thefinish_requestfunction to simply checkif req_state.request_id in lora_stats.running_requests:This patch may be removed once updating vLLM to v12.0+
File changed
areal/thirdparty/vllm/areal_vllm_server.pyRelated Issue
Fixes #751
Testing
Type of Change
work as expected)
Checklist
jb build docs/gemini review)Breaking Change Details (if applicable):
Additional Context
Need help? Check the Contributing Guide or ask in
GitHub Discussions!