-
Notifications
You must be signed in to change notification settings - Fork 257
Description
Checklist
- This feature will maintain backward compatibility with the current APIs in
areal/api/. If not, please raise a refactor issue first.
Background
Is your feature request related to an enhancement or a new use case? Please describe.
Enhancement Request: I'd like to provide complete Tau2 environment configuration and usage examples for running Reinforcement Learning (RL) experiments using AEnvironment. This will enable researchers and developers to seamlessly integrate Tau2 benchmark tasks into their RL training pipelines.
About AEnvironment: AEnvironment (AEnv) is Ant Group's internal environment engineering infrastructure that is deeply integrated with AReaL. It is an Environment-as-Code development framework that enables developers to define reusable environments using Python for agent construction and reinforcement learning training. The framework supports the MCP (Model Context Protocol) and enables one-click deployment to the cloud.
Potential Solution
A clear and concise description of the potential implementation or how similar features are handled in other frameworks.
The solution involves:
-
Tau2 Experiment Configuration Files: Provide production-ready configuration files for RL experiments, including AReaL integration configs, training hyperparameters, and evaluation settings.
-
Usage Examples: Create comprehensive examples demonstrating how to build and use Tau2 environment based on AEnvironment, including:
- Basic environment initialization and tool usage
- Complete integration with AReaL for RL training
- Custom agent development patterns
-
Documentation: Enhance existing documentation with step-by-step guides for setting up Tau2 environment for RL experiments.
This approach follows patterns seen in OpenAI Gym/Universe (standardized environment interface for RL) and Ray RLlib (environment wrappers and configuration management).
Additional Information
(Add any relevant context, references, or supporting data here.)