[WIP] example: tau2-bench training #740

dhh1995 · 2025-12-17T08:58:26Z

Description

This example demonstrates how to train customer service agents using the $\tau^2$-Bench with AReaL's PPO/GRPO training pipeline. The $\tau^2$-Bench provides realistic customer service simulation environments across multiple domains (retail, airline, telecom) where agents must help with user's request by both using agent tools and guiding users using their tools.

Curve for training reward on telecom-small subset.

Related Issue

Fixes #(issue)

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not
work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with
jb build docs
No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

gemini-code-assist · 2025-12-17T08:58:44Z

Summary of Changes

Hello @dhh1995, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, self-contained example for training customer service agents within the tau2-bench simulation framework, leveraging AReaL's reinforcement learning capabilities. It includes necessary adjustments to the proxy server's reward handling for improved robustness and flexibility, alongside a new workflow to orchestrate training episodes. The addition of dedicated configuration and utility files streamlines the setup and execution of these complex training scenarios, making it easier to reproduce and extend the example.

Highlights

New Example: Tau2-Bench Training: Introduced a comprehensive example demonstrating how to train customer service agents using the tau2-bench simulation environment with AReaL's PPO/GRPO training pipeline.
Enhanced Reward Setting Logic: Improved the reward setting mechanism in proxy.py to gracefully handle cases where an interaction ID is not explicitly provided, defaulting to the last interaction and adding robust error logging for missing interactions.
Flexible Reward Processing: Modified proxy_utils.py to allow the reward function to return additional information alongside the reward value, enhancing data capture during training episodes.
New Workflow for Episode Management: Implemented a Tau2Workflow in tau2_train.py to manage the execution of training episodes, including parallel processing of agent runs and structured dumping of simulation results.
Dedicated Configuration and Utilities: Added config.yaml for detailed configuration of the tau2-bench training and tau2_utils.py for defining data models and environment-specific settings, ensuring a modular and organized example.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new example for training customer service agents using the tau2-bench, which is a significant and valuable addition. The overall structure is well-designed, with clear separation of concerns between the agent logic, training script, and utilities. The modifications to existing proxy utilities, such as improved error handling and more flexible function returns, are also commendable. My review focuses on enhancing the new example files by addressing minor issues in documentation, improving code clarity, and suggesting small optimizations for better maintainability.

gemini-code-assist · 2025-12-17T09:01:07Z

examples/tau2/README.md

+
+The code is modified from the [proxy](../experimental/proxy/README.md) example so that the training workflow (`tau2_train.py`) and the agent runner script (`tau2_agent.py`) can be decoupled, with common utilities in `tau2_utils.py`.
+
+* `tau2_train.py`: 


The description for tau2_train.py is incomplete. Please add a brief explanation of its role to improve the documentation's clarity and help users understand the example's structure.

gemini-code-assist · 2025-12-17T09:01:07Z

examples/tau2/README.md

+
+## Notes
+
+1. When using litellm with multiprocessing, the `Queue bound to different event loop` error may occur. See also: [litellm issue #17813](https://github.com/BerriAI/litellm/issues/17813). This will not stop the training, but will make the outputs hard to read. You may use `grep -aivE "loop|queue|\^|asyncio|litellm"` to filter out the error messages before this issue is fixed.


There appears to be a typo in the litellm issue number. Issue #17813 does not exist. The correct issue number is likely #1781, which discusses the Queue bound to different event loop error. Please correct the link to ensure it points to the correct resource.

Suggested change

1. When using litellm with multiprocessing, the `Queue bound to different event loop` error may occur. See also: [litellm issue #17813](https://github.com/BerriAI/litellm/issues/17813). This will not stop the training, but will make the outputs hard to read. You may use `grep -aivE "loop|queue|\^|asyncio|litellm"` to filter out the error messages before this issue is fixed.

1. When using litellm with multiprocessing, the `Queue bound to different event loop` error may occur. See also: [litellm issue #1781](https://github.com/BerriAI/litellm/issues/1781). This will not stop the training, but will make the outputs hard to read. You may use `grep -aivE "loop|queue|\^|asyncio|litellm"` to filter out the error messages before this issue is fixed.

gemini-code-assist · 2025-12-17T09:01:07Z

examples/tau2/tau2_agent.py

+    tasks: list[Task] = registry.get_tasks_loader(domain)(split)
+    for task in tasks:
+        if task.id == task_id:
+            return task
+    raise ValueError(f"No task found with id {task_id} for domain {domain}")


The current implementation iterates through the list of tasks to find a match, which has a time complexity of O(n). For better performance, especially if the number of tasks grows, consider converting the list of tasks into a dictionary for O(1) lookups.

Suggested change

tasks: list[Task] = registry.get_tasks_loader(domain)(split)

for task in tasks:

if task.id == task_id:

return task

raise ValueError(f"No task found with id {task_id} for domain {domain}")

tasks: list[Task] = registry.get_tasks_loader(domain)(split)

task_map = {task.id: task for task in tasks}

if task_id not in task_map:

raise ValueError(f"No task found with id {task_id} for domain {domain}")

return task_map[task_id]

gemini-code-assist · 2025-12-17T09:01:07Z

examples/tau2/tau2_agent.py

+        # * Backup: use acreate to replace acompletion
+        # async def _acreate(*args, **kwargs):
+        #     kwargs.pop("num_retries", None)
+        #         completion = await client.chat.completions.create(*args, **kwargs)
+        #     return completion
+
+        # async def _acreate_with_base_url(*args, **kwargs):
+        #     kwargs.pop("num_retries", None)
+        #     async with AsyncOpenAI(base_url=self.econfig.user_llm_base_url) as client:
+        #         completion = await client.chat.completions.create(*args, **kwargs)
+        #     return completion


This block of commented-out code appears to be a backup or alternative implementation. To improve code clarity and maintainability, it's best to remove such code. If this logic is important for reference, consider moving it to the PR description or a separate document.

gemini-code-assist · 2025-12-17T09:01:07Z

examples/tau2/tau2_train.py

+
+        # Dump info to file
+        if "task_id" in data:
+            real_task_id = data["task_id"][:120] + "-" + task_id


The slice [:120] uses a magic number to truncate the task ID. This could be confusing for future readers. To improve readability and maintainability, please add a comment explaining why the task ID is being truncated (e.g., to prevent overly long filenames), or define 120 as a named constant.

example: tau2-bench training

abfbc1e

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

nuzant mentioned this pull request Dec 17, 2025

[Bug Fix] Fix Tools compatibility, max_token restrictions, and EOS token issues in Proxy mode #736

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] example: tau2-bench training #740

[WIP] example: tau2-bench training #740

Uh oh!

dhh1995 commented Dec 17, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

gemini-code-assist bot Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		The code is modified from the [proxy](../experimental/proxy/README.md) example so that the training workflow (`tau2_train.py`) and the agent runner script (`tau2_agent.py`) can be decoupled, with common utilities in `tau2_utils.py`.

		* `tau2_train.py`:


		## Notes

		1. When using litellm with multiprocessing, the `Queue bound to different event loop` error may occur. See also: [litellm issue #17813](https://github.com/BerriAI/litellm/issues/17813). This will not stop the training, but will make the outputs hard to read. You may use `grep -aivE "loop\|queue\|\^\|asyncio\|litellm"` to filter out the error messages before this issue is fixed.

[WIP] example: tau2-bench training #740

Are you sure you want to change the base?

[WIP] example: tau2-bench training #740

Uh oh!

Conversation

dhh1995 commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dhh1995 commented Dec 17, 2025 •

edited

Loading