[Bug Fix] Fix Tools compatibility, max_token restrictions, and EOS token issues in Proxy mode #736

yulangz · 2025-12-17T03:26:16Z

Description

This PR addresses a series of issues encountered when training in Proxy mode:

In concat mode, the token concatenation logic did not correctly handle tools and tool_call, preventing the model from invoking tools.
In multi-turn conversations, the max_tokens could not be limited according to the gconfig.max_tokens configuration, causing the total token count to exceed mb_spec.max_tokens_per_mb and making training impossible.
When processing tool calls in the model's output, the reasoning_parser was not used, leading to content within that matched the tool call format being incorrectly processed as a tool_call.
The returned text contained an extraneous EOS token.
During concatenation in concat mode, the process did not strictly adhere to the chat_template. For example, in the QWen3 model, there was a missing '\n' character between im_end and im_start.

Later, I will add unit tests for all the issues I encountered.

Related Issue

Fixes #(issue)

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not
work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with
jb build docs
No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

…_proxy_mode

gemini-code-assist · 2025-12-17T03:26:41Z

Summary of Changes

Hello @yulangz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers crucial bug fixes for the Proxy mode, enhancing its stability and correctness. It resolves issues related to tool invocation, token limit enforcement in multi-turn dialogues, and accurate parsing of model outputs, including the correct handling of reasoning and tool calls, and the removal of unnecessary tokens. These changes are vital for reliable model training and interaction within the Proxy environment.

Highlights

Tool Compatibility in Concat Mode: The token concatenation logic has been fixed to correctly handle tools and tool_call messages, enabling the model to invoke tools properly in concat mode.
Max Token Restrictions: The max_tokens limitation in multi-turn conversations has been addressed, ensuring that the total token count adheres to gconfig.max_tokens and mb_spec.max_tokens_per_mb, preventing training failures.
Reasoning Parser for Tool Calls: The reasoning_parser is now correctly used when processing tool calls in the model's output, preventing content that matches the tool call format from being incorrectly processed as a tool_call.
Extraneous EOS Token Removal: An issue where the returned text contained an extraneous End-Of-Sequence (EOS) token has been resolved, leading to cleaner model outputs.
Chat Template Adherence: The concatenation process in concat mode now strictly adheres to the chat_template, fixing issues like missing newline characters between im_end and im_start in models like QWen3.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several important bug fixes for training in Proxy mode. The changes correctly handle tool calls in concatenation mode, enforce max_tokens restrictions to prevent training failures, and fix issues with extraneous EOS tokens in the output. A reasoning_parser is now properly utilized to avoid misinterpreting content within <think> blocks as tool calls. Additionally, the token concatenation logic has been improved to strictly follow the chat template.

My review focuses on the correctness of these fixes. The logic for handling token limits, parsing tool calls, and managing chat templates appears sound. I've also noted the addition of new unit tests, which is a great improvement. I have only a couple of minor suggestions to remove some leftover debug print statements.

areal/experimental/openai/client.py

Copilot

Pull request overview

This PR fixes critical issues in Proxy mode training related to tool calls, token limits, and EOS token handling. The changes enable proper tool invocation in concat mode, enforce max_tokens limits to prevent training failures, correctly parse reasoning content before extracting tool calls, and remove extraneous EOS tokens from output.

Key changes:

Enhanced tool call processing to use reasoning parser, preventing incorrect extraction of tool calls from reasoning content
Implemented proper max_tokens enforcement via engine_max_tokens parameter to prevent exceeding model token limits
Fixed token concatenation in concat mode to properly handle tools and follow chat templates (e.g., missing '\n' in Qwen3)
Added output_tokens_without_stop property to exclude EOS tokens from decoded output
Modified completions return structure to be session-scoped, enabling better dump functionality

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
areal/experimental/openai/tool_call_parser.py	Added reasoning_parser parameter to separate reasoning content from tool calls before extraction
areal/experimental/openai/client.py	Implemented engine_max_tokens enforcement, fixed concat mode to use chat_template with tools support, added debug prints, used output_tokens_without_stop for decoding
areal/experimental/openai/proxy.py	Added reasoning_parser and engine_max_tokens parameters, updated return type to be session-scoped, updated imports
areal/api/io_struct.py	Added output_tokens_without_stop property to exclude stop tokens from output, reformatted assertion
areal/api/cli_args.py	Added use_beam_search to unsupported OpenAI parameters list with clarifying comment
examples/experimental/proxy/train.py	Added dump functionality for session completions, updated completion merging logic, added aiofiles imports
examples/experimental/proxy/train_sep.py	Updated completion merging logic to match train.py, added reasoning_parser parameter
examples/experimental/proxy/gsm8k_multi_turn_agent.py	Fixed message serialization to use exclude_none=True for API compatibility
areal/tests/experimental/openai/test_tool_call_parser.py	Added comprehensive tests for tool call parsing with and without reasoning content
areal/experimental/openai/types.py	Reformatted multi-line assertion for better readability
areal/experimental/openai/cache.py	Reformatted multi-line assertion for better readability
areal/utils/proxy_utils.py	Added TYPE_CHECKING imports for better type hinting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

areal/api/io_struct.py

areal/experimental/openai/client.py

areal/tests/experimental/openai/test_tool_call_parser.py

areal/experimental/openai/client.py

Copilot · 2025-12-17T03:31:54Z

areal/experimental/openai/tool_call_parser.py

    tools: list[Any],
-    tool_call_parser: str | None,
+    tool_call_parser: str,
+    reasoning_parser: str,


The function signature declares reasoning_parser as str (non-nullable), but callers may pass None since self.reasoning_parser can be None (see line 160 and 399 in client.py where it's typed as str | None = None). When reasoning_parser is None, the call to ReasoningParser(reasoning_parser) on line 54 will likely fail. Either make the parameter nullable with proper None handling, or ensure all callers provide a non-None value.

Please take copilot's suggestion and check scenarios when reasoning_parser is None.

areal/utils/proxy_utils.py

areal/tests/experimental/openai/test_tool_call_parser.py

areal/experimental/openai/cache.py

areal/experimental/openai/client.py

nuzant · 2025-12-22T06:41:28Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive set of fixes and improvements for training in Proxy mode. The changes address critical issues with tool compatibility, token restrictions, and EOS token handling. Key improvements include a more robust token concatenation logic using chat_template, proper enforcement of max_tokens with graceful error handling, and correct parsing of tool calls by using a reasoning_parser. The addition of extensive unit tests for the new logic is a significant plus. Overall, this is a high-quality contribution that greatly improves the stability and correctness of the proxy mode. I have one suggestion to reduce minor code duplication in the error handling logic.

areal/experimental/openai/client.py

yulangz · 2025-12-22T06:58:25Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive set of fixes and improvements for training in Proxy mode, particularly enhancing compatibility with tool usage, max_token restrictions, and EOS token handling. The changes are robust, well-tested, and significantly improve the reliability of multi-turn conversations in concat mode. Key changes include a rewrite of the prompt concatenation logic to correctly use chat templates, a fix in the tool call parser to handle reasoning blocks, and better error handling. I've identified a critical issue in the reasoning block detection logic that could lead to incorrect text parsing and have suggested a more robust implementation. I also noted a minor code duplication that could be refactored for better maintainability. Overall, this is an excellent PR that addresses several important issues.

areal/experimental/openai/tool_call_parser.py

areal/experimental/openai/client.py

mark skip for failing test due to pad_token issue

yulangz and others added 13 commits December 15, 2025 16:53

change completions export; dump completions

623b19e

fix aiofiles

62d79eb

fix: fix multi-turn proxy example

4efafd3

fix

0fe3089

Merge branch 'pr-733' into wht/feature/fix_stop_tokens_of_generate_in…

c824284

…_proxy_mode

not model dump

b7fe128

with model_dump

1738f10

fix chat template in concat mode

4a04332

add log

05313a7

limit max_tokens by gconfig

9591133

support reasoning_parser

e28dab1

process eos in output tokens

3df5bf9

format

669e321

Copilot AI review requested due to automatic review settings December 17, 2025 03:26

Copilot started reviewing on behalf of yulangz December 17, 2025 03:26 View session

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

areal/experimental/openai/client.py Outdated Show resolved Hide resolved

areal/experimental/openai/client.py Outdated Show resolved Hide resolved

Copilot AI reviewed Dec 17, 2025

View reviewed changes

yulangz added 3 commits December 17, 2025 11:33

merge main

b8dff9a

remove debug code

7aea3d8

fix stop_reason type hint

5b0aa0f

dhh1995 reviewed Dec 17, 2025

View reviewed changes

areal/experimental/openai/cache.py Outdated Show resolved Hide resolved

areal/experimental/openai/client.py Outdated Show resolved Hide resolved

yulangz added 8 commits December 17, 2025 11:57

ruff format

98fd6ad

format

2c28dba

add unittest

e6febb1

debug print

aa72ca7

log

6b698a7

fix prefix compare

a5bcab4

fix it in _ensure_message_dict_list

b83cb11

fix get_results

c226ed2

yulangz added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 19, 2025

yulangz added 2 commits December 19, 2025 17:32

ignore stop in concat mode

612a4fd

format

9a4a372

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

areal/experimental/openai/client.py Show resolved Hide resolved

fix reasoning_parser type hint

ea4fb96

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

areal/experimental/openai/tool_call_parser.py Show resolved Hide resolved

areal/experimental/openai/client.py Show resolved Hide resolved

nuzant added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 22, 2025

nuzant had a problem deploying to AReaL-unittests December 22, 2025 07:18 — with GitHub Actions Failure

fix unittest

34d4fea

yulangz added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 22, 2025

yulangz had a problem deploying to AReaL-unittests December 22, 2025 08:11 — with GitHub Actions Failure

yulangz added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 22, 2025

yulangz had a problem deploying to AReaL-unittests December 22, 2025 09:18 — with GitHub Actions Failure

add more raise message

4d37aef

yulangz added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 22, 2025

yulangz had a problem deploying to AReaL-unittests December 22, 2025 10:03 — with GitHub Actions Failure

add comment for copied copied code

01f2381

mark skip for failing test due to pad_token issue

yulangz added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 22, 2025

yulangz temporarily deployed to AReaL-unittests December 22, 2025 12:22 — with GitHub Actions Inactive

nuzant approved these changes Dec 23, 2025

View reviewed changes

nuzant merged commit fa0eb7b into main Dec 23, 2025
7 checks passed

nuzant deleted the wht/feature/fix_stop_tokens_of_generate_in_proxy_mode branch December 23, 2025 02:13

[Bug Fix] Fix Tools compatibility, max_token restrictions, and EOS token issues in Proxy mode #736

[Bug Fix] Fix Tools compatibility, max_token restrictions, and EOS token issues in Proxy mode #736

Uh oh!

Conversation

yulangz commented Dec 17, 2025

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

nuzant Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nuzant commented Dec 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

yulangz commented Dec 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants