-
Notifications
You must be signed in to change notification settings - Fork 975
feat(verl): add unexpected tool call filtering #467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(verl): add unexpected tool call filtering #467
Conversation
|
@microsoft-github-policy-service agree company="Gwangju Institute of Science and Technology" |
Add filtering for "unexpected tool call" turns where the model continues generating after a tool call instead of stopping at </tool_call><|im_end|>. This helps prevent entropy explosion during GRPO training. Changes: - daemon.py: Add _setup_tool_call_filter(), _count_invalid_turns(), _filter_invalid_turns(), and void turn filtering - config.yaml: Add filter_unexpected_tool_calls option (default: False) - trainer.py: Fix missing gts parameter in _dump_generations() - examples/calc_x/train_calc_agent.py: Add --filter-unexpected-tool-calls CLI flag Key improvements over Youtu branch: - Uses apply_chat_template() for model-agnostic token detection - Supports multiple valid endings (eos_token, pad_token variants) - Uses calculator tool example for calc-x consistency Reference: contrib/youtu-agent-lightning branch
7a5ee47 to
555d1fc
Compare
| and self.trace_aggregator.get("debug", False) | ||
| else {} | ||
| ), | ||
| "training/n_unexpected_tool_calls": n_unexpected_tool_calls, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment: only set the logging metrics visible when self.tool_parser is not None.
| import agentlightning as agl | ||
| from agentlightning.env_var import LightningEnvVar, resolve_bool_env_var, resolve_str_env_var | ||
|
|
||
| # Ensure venv bin is in PATH (needed for uvx/mcp-server-calculator in Ray workers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some unnecessary changes to this file. Only related config should be included here I think.
| filter_unexpected_tool_calls: bool = False, | ||
| experiment_name: Optional[str] = None, | ||
| n_gpus: int = 1, | ||
| checkpoint_dir: str = "/home/jovyan/msra/experiments/checkpoints", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain about this line? It seems that this path belongs to someone else?
| "--checkpoint-dir", | ||
| type=str, | ||
| default="/home/jovyan/msra/experiments/checkpoints", | ||
| help="Directory to save checkpoints (default: /home/jovyan/msra/experiments/checkpoints)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your careful review and for raising this question.
To clarify, /home/jovyan is not a specific person's directory—it is the default home directory name on the OpenHPC server provided by my university (GIST). The msra folder is my personal working directory that I created specifically for this project, which is also linked to my GitHub repository.
I have attached screenshots of my university's HPC-AI Service Portal as evidence. As you can see, /home/jovyan is the default home directory automatically assigned when a workspace is created on this server.
I attached the training code without modification because I wanted to transparently show exactly how the experiments were conducted. However, I realize now that I should have cleaned up these internal file paths before submission. I apologize for any confusion this may have caused—this is my first time collaborating with an industry partner, and I was not aware this could raise concerns.
Summary
</tool_call><|im_end|>)training/unexpected_tool_call_ratiometric for monitoringgtsparameter in validation data dumpConfiguration
YAML (
agentlightning/verl/config.yaml):CLI:
python examples/calc_x/train_calc_agent.py --filter-unexpected-tool-callsVerification
cd examples/calc_xFilter OFF (baseline)
Filter ON