Skip to content

Conversation

@yulangz
Copy link
Collaborator

@yulangz yulangz commented Dec 17, 2025

Description

This PR addresses a series of issues encountered when training in Proxy mode:

  • In concat mode, the token concatenation logic did not correctly handle tools and tool_call, preventing the model from invoking tools.
  • In multi-turn conversations, the max_tokens could not be limited according to the gconfig.max_tokens configuration, causing the total token count to exceed mb_spec.max_tokens_per_mb and making training impossible.
  • When processing tool calls in the model's output, the reasoning_parser was not used, leading to content within that matched the tool call format being incorrectly processed as a tool_call.
  • The returned text contained an extraneous EOS token.
  • During concatenation in concat mode, the process did not strictly adhere to the chat_template. For example, in the QWen3 model, there was a missing '\n' character between im_end and im_start.

Later, I will add unit tests for all the issues I encountered.

Related Issue

Fixes #(issue)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context


Need help? Check the Contributing Guide or ask in
GitHub Discussions!

Copilot AI review requested due to automatic review settings December 17, 2025 03:26
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yulangz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers crucial bug fixes for the Proxy mode, enhancing its stability and correctness. It resolves issues related to tool invocation, token limit enforcement in multi-turn dialogues, and accurate parsing of model outputs, including the correct handling of reasoning and tool calls, and the removal of unnecessary tokens. These changes are vital for reliable model training and interaction within the Proxy environment.

Highlights

  • Tool Compatibility in Concat Mode: The token concatenation logic has been fixed to correctly handle tools and tool_call messages, enabling the model to invoke tools properly in concat mode.
  • Max Token Restrictions: The max_tokens limitation in multi-turn conversations has been addressed, ensuring that the total token count adheres to gconfig.max_tokens and mb_spec.max_tokens_per_mb, preventing training failures.
  • Reasoning Parser for Tool Calls: The reasoning_parser is now correctly used when processing tool calls in the model's output, preventing content that matches the tool call format from being incorrectly processed as a tool_call.
  • Extraneous EOS Token Removal: An issue where the returned text contained an extraneous End-Of-Sequence (EOS) token has been resolved, leading to cleaner model outputs.
  • Chat Template Adherence: The concatenation process in concat mode now strictly adheres to the chat_template, fixing issues like missing newline characters between im_end and im_start in models like QWen3.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several important bug fixes for training in Proxy mode. The changes correctly handle tool calls in concatenation mode, enforce max_tokens restrictions to prevent training failures, and fix issues with extraneous EOS tokens in the output. A reasoning_parser is now properly utilized to avoid misinterpreting content within <think> blocks as tool calls. Additionally, the token concatenation logic has been improved to strictly follow the chat template.

My review focuses on the correctness of these fixes. The logic for handling token limits, parsing tool calls, and managing chat templates appears sound. I've also noted the addition of new unit tests, which is a great improvement. I have only a couple of minor suggestions to remove some leftover debug print statements.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes critical issues in Proxy mode training related to tool calls, token limits, and EOS token handling. The changes enable proper tool invocation in concat mode, enforce max_tokens limits to prevent training failures, correctly parse reasoning content before extracting tool calls, and remove extraneous EOS tokens from output.

Key changes:

  • Enhanced tool call processing to use reasoning parser, preventing incorrect extraction of tool calls from reasoning content
  • Implemented proper max_tokens enforcement via engine_max_tokens parameter to prevent exceeding model token limits
  • Fixed token concatenation in concat mode to properly handle tools and follow chat templates (e.g., missing '\n' in Qwen3)
  • Added output_tokens_without_stop property to exclude EOS tokens from decoded output
  • Modified completions return structure to be session-scoped, enabling better dump functionality

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
areal/experimental/openai/tool_call_parser.py Added reasoning_parser parameter to separate reasoning content from tool calls before extraction
areal/experimental/openai/client.py Implemented engine_max_tokens enforcement, fixed concat mode to use chat_template with tools support, added debug prints, used output_tokens_without_stop for decoding
areal/experimental/openai/proxy.py Added reasoning_parser and engine_max_tokens parameters, updated return type to be session-scoped, updated imports
areal/api/io_struct.py Added output_tokens_without_stop property to exclude stop tokens from output, reformatted assertion
areal/api/cli_args.py Added use_beam_search to unsupported OpenAI parameters list with clarifying comment
examples/experimental/proxy/train.py Added dump functionality for session completions, updated completion merging logic, added aiofiles imports
examples/experimental/proxy/train_sep.py Updated completion merging logic to match train.py, added reasoning_parser parameter
examples/experimental/proxy/gsm8k_multi_turn_agent.py Fixed message serialization to use exclude_none=True for API compatibility
areal/tests/experimental/openai/test_tool_call_parser.py Added comprehensive tests for tool call parsing with and without reasoning content
areal/experimental/openai/types.py Reformatted multi-line assertion for better readability
areal/experimental/openai/cache.py Reformatted multi-line assertion for better readability
areal/utils/proxy_utils.py Added TYPE_CHECKING imports for better type hinting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tools: list[Any],
tool_call_parser: str | None,
tool_call_parser: str,
reasoning_parser: str,
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature declares reasoning_parser as str (non-nullable), but callers may pass None since self.reasoning_parser can be None (see line 160 and 399 in client.py where it's typed as str | None = None). When reasoning_parser is None, the call to ReasoningParser(reasoning_parser) on line 54 will likely fail. Either make the parameter nullable with proper None handling, or ensure all callers provide a non-None value.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take copilot's suggestion and check scenarios when reasoning_parser is None.

@yulangz yulangz added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 19, 2025
@nuzant
Copy link
Collaborator

nuzant commented Dec 22, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of fixes and improvements for training in Proxy mode. The changes address critical issues with tool compatibility, token restrictions, and EOS token handling. Key improvements include a more robust token concatenation logic using chat_template, proper enforcement of max_tokens with graceful error handling, and correct parsing of tool calls by using a reasoning_parser. The addition of extensive unit tests for the new logic is a significant plus. Overall, this is a high-quality contribution that greatly improves the stability and correctness of the proxy mode. I have one suggestion to reduce minor code duplication in the error handling logic.

@yulangz
Copy link
Collaborator Author

yulangz commented Dec 22, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of fixes and improvements for training in Proxy mode, particularly enhancing compatibility with tool usage, max_token restrictions, and EOS token handling. The changes are robust, well-tested, and significantly improve the reliability of multi-turn conversations in concat mode. Key changes include a rewrite of the prompt concatenation logic to correctly use chat templates, a fix in the tool call parser to handle reasoning blocks, and better error handling. I've identified a critical issue in the reasoning block detection logic that could lead to incorrect text parsing and have suggested a more robust implementation. I also noted a minor code duplication that could be refactored for better maintainability. Overall, this is an excellent PR that addresses several important issues.

@nuzant nuzant added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 22, 2025
@yulangz yulangz added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 22, 2025
@yulangz yulangz added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 22, 2025
@yulangz yulangz added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 22, 2025
mark skip for failing test due to pad_token issue
@yulangz yulangz added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Dec 22, 2025
@nuzant nuzant merged commit fa0eb7b into main Dec 23, 2025
7 checks passed
@nuzant nuzant deleted the wht/feature/fix_stop_tokens_of_generate_in_proxy_mode branch December 23, 2025 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe-to-test Ready to run unit-tests in a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants