-
Notifications
You must be signed in to change notification settings - Fork 257
[Bug Fix] Fix Tools compatibility, max_token restrictions, and EOS token issues in Proxy mode #736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Fix] Fix Tools compatibility, max_token restrictions, and EOS token issues in Proxy mode #736
Conversation
Summary of ChangesHello @yulangz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers crucial bug fixes for the Proxy mode, enhancing its stability and correctness. It resolves issues related to tool invocation, token limit enforcement in multi-turn dialogues, and accurate parsing of model outputs, including the correct handling of reasoning and tool calls, and the removal of unnecessary tokens. These changes are vital for reliable model training and interaction within the Proxy environment. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several important bug fixes for training in Proxy mode. The changes correctly handle tool calls in concatenation mode, enforce max_tokens restrictions to prevent training failures, and fix issues with extraneous EOS tokens in the output. A reasoning_parser is now properly utilized to avoid misinterpreting content within <think> blocks as tool calls. Additionally, the token concatenation logic has been improved to strictly follow the chat template.
My review focuses on the correctness of these fixes. The logic for handling token limits, parsing tool calls, and managing chat templates appears sound. I've also noted the addition of new unit tests, which is a great improvement. I have only a couple of minor suggestions to remove some leftover debug print statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes critical issues in Proxy mode training related to tool calls, token limits, and EOS token handling. The changes enable proper tool invocation in concat mode, enforce max_tokens limits to prevent training failures, correctly parse reasoning content before extracting tool calls, and remove extraneous EOS tokens from output.
Key changes:
- Enhanced tool call processing to use reasoning parser, preventing incorrect extraction of tool calls from reasoning content
- Implemented proper max_tokens enforcement via
engine_max_tokensparameter to prevent exceeding model token limits - Fixed token concatenation in concat mode to properly handle tools and follow chat templates (e.g., missing '\n' in Qwen3)
- Added
output_tokens_without_stopproperty to exclude EOS tokens from decoded output - Modified completions return structure to be session-scoped, enabling better dump functionality
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| areal/experimental/openai/tool_call_parser.py | Added reasoning_parser parameter to separate reasoning content from tool calls before extraction |
| areal/experimental/openai/client.py | Implemented engine_max_tokens enforcement, fixed concat mode to use chat_template with tools support, added debug prints, used output_tokens_without_stop for decoding |
| areal/experimental/openai/proxy.py | Added reasoning_parser and engine_max_tokens parameters, updated return type to be session-scoped, updated imports |
| areal/api/io_struct.py | Added output_tokens_without_stop property to exclude stop tokens from output, reformatted assertion |
| areal/api/cli_args.py | Added use_beam_search to unsupported OpenAI parameters list with clarifying comment |
| examples/experimental/proxy/train.py | Added dump functionality for session completions, updated completion merging logic, added aiofiles imports |
| examples/experimental/proxy/train_sep.py | Updated completion merging logic to match train.py, added reasoning_parser parameter |
| examples/experimental/proxy/gsm8k_multi_turn_agent.py | Fixed message serialization to use exclude_none=True for API compatibility |
| areal/tests/experimental/openai/test_tool_call_parser.py | Added comprehensive tests for tool call parsing with and without reasoning content |
| areal/experimental/openai/types.py | Reformatted multi-line assertion for better readability |
| areal/experimental/openai/cache.py | Reformatted multi-line assertion for better readability |
| areal/utils/proxy_utils.py | Added TYPE_CHECKING imports for better type hinting |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| tools: list[Any], | ||
| tool_call_parser: str | None, | ||
| tool_call_parser: str, | ||
| reasoning_parser: str, |
Copilot
AI
Dec 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature declares reasoning_parser as str (non-nullable), but callers may pass None since self.reasoning_parser can be None (see line 160 and 399 in client.py where it's typed as str | None = None). When reasoning_parser is None, the call to ReasoningParser(reasoning_parser) on line 54 will likely fail. Either make the parameter nullable with proper None handling, or ensure all callers provide a non-None value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take copilot's suggestion and check scenarios when reasoning_parser is None.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive set of fixes and improvements for training in Proxy mode. The changes address critical issues with tool compatibility, token restrictions, and EOS token handling. Key improvements include a more robust token concatenation logic using chat_template, proper enforcement of max_tokens with graceful error handling, and correct parsing of tool calls by using a reasoning_parser. The addition of extensive unit tests for the new logic is a significant plus. Overall, this is a high-quality contribution that greatly improves the stability and correctness of the proxy mode. I have one suggestion to reduce minor code duplication in the error handling logic.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive set of fixes and improvements for training in Proxy mode, particularly enhancing compatibility with tool usage, max_token restrictions, and EOS token handling. The changes are robust, well-tested, and significantly improve the reliability of multi-turn conversations in concat mode. Key changes include a rewrite of the prompt concatenation logic to correctly use chat templates, a fix in the tool call parser to handle reasoning blocks, and better error handling. I've identified a critical issue in the reasoning block detection logic that could lead to incorrect text parsing and have suggested a more robust implementation. I also noted a minor code duplication that could be refactored for better maintainability. Overall, this is an excellent PR that addresses several important issues.
mark skip for failing test due to pad_token issue
Description
This PR addresses a series of issues encountered when training in Proxy mode:
toolsandtool_call, preventing the model from invoking tools.max_tokenscould not be limited according to thegconfig.max_tokensconfiguration, causing the total token count to exceedmb_spec.max_tokens_per_mband making training impossible.reasoning_parserwas not used, leading to content within that matched the tool call format being incorrectly processed as atool_call.chat_template. For example, in the QWen3 model, there was a missing '\n' character betweenim_endandim_start.Later, I will add unit tests for all the issues I encountered.
Related Issue
Fixes #(issue)
Type of Change
work as expected)
Checklist
jb build docs/gemini review)Breaking Change Details (if applicable):
Additional Context
Need help? Check the Contributing Guide or ask in
GitHub Discussions!