feat(verl): add unexpected tool call filtering #467

iamseungpil · 2026-01-25T14:21:35Z

Summary

Add filtering for turns with malformed tool call endings (missing </tool_call><|im_end|>)
Track training/unexpected_tool_call_ratio metric for monitoring
Fix missing gts parameter in validation data dump

Configuration

YAML (agentlightning/verl/config.yaml):

agentlightning:                                                                                                     
  trace_aggregator:                                                                                                 
    filter_unexpected_tool_calls: true  # default: false

CLI:
python examples/calc_x/train_calc_agent.py --filter-unexpected-tool-calls

Verification
cd examples/calc_x

Filter OFF (baseline)

python train_calc_agent.py \                                                                                        
    --train-file data/train.parquet \                                                                               
    --val-file data/test.parquet \                                                                                  
    --experiment-name grpo_baseline

Filter ON

python train_calc_agent.py \                                                                                        
      --train-file data/train.parquet \                                                                               
      --val-file data/test.parquet \                                                                                  
      --filter-unexpected-tool-calls \                                                                                
      --experiment-name grpo_filter_on```

iamseungpil · 2026-01-25T14:25:18Z

@microsoft-github-policy-service agree company="Gwangju Institute of Science and Technology"

Add filtering for "unexpected tool call" turns where the model continues generating after a tool call instead of stopping at </tool_call><|im_end|>. This helps prevent entropy explosion during GRPO training. Changes: - daemon.py: Add _setup_tool_call_filter(), _count_invalid_turns(), _filter_invalid_turns(), and void turn filtering - config.yaml: Add filter_unexpected_tool_calls option (default: False) - trainer.py: Fix missing gts parameter in _dump_generations() - examples/calc_x/train_calc_agent.py: Add --filter-unexpected-tool-calls CLI flag Key improvements over Youtu branch: - Uses apply_chat_template() for model-agnostic token detection - Supports multiple valid endings (eos_token, pad_token variants) - Uses calculator tool example for calc-x consistency Reference: contrib/youtu-agent-lightning branch

JiahangXu · 2026-01-26T13:29:06Z

agentlightning/verl/daemon.py

                and self.trace_aggregator.get("debug", False)
                else {}
            ),
+            "training/n_unexpected_tool_calls": n_unexpected_tool_calls,


Small comment: only set the logging metrics visible when self.tool_parser is not None.

XufangLuo · 2026-01-27T01:42:02Z

examples/calc_x/train_calc_agent.py

 import agentlightning as agl
 from agentlightning.env_var import LightningEnvVar, resolve_bool_env_var, resolve_str_env_var

+# Ensure venv bin is in PATH (needed for uvx/mcp-server-calculator in Ray workers)


Some unnecessary changes to this file. Only related config should be included here I think.

XufangLuo · 2026-01-27T01:43:26Z

examples/calc_x/train_calc_agent.py

+    filter_unexpected_tool_calls: bool = False,
+    experiment_name: Optional[str] = None,
+    n_gpus: int = 1,
+    checkpoint_dir: str = "/home/jovyan/msra/experiments/checkpoints",


Could you please explain about this line? It seems that this path belongs to someone else?

XufangLuo · 2026-01-27T02:03:55Z

examples/calc_x/train_calc_agent.py

+        "--checkpoint-dir",
+        type=str,
+        default="/home/jovyan/msra/experiments/checkpoints",
+        help="Directory to save checkpoints (default: /home/jovyan/msra/experiments/checkpoints)",


Thank you for your careful review and for raising this question.

To clarify, /home/jovyan is not a specific person's directory—it is the default home directory name on the OpenHPC server provided by my university (GIST). The msra folder is my personal working directory that I created specifically for this project, which is also linked to my GitHub repository.

I have attached screenshots of my university's HPC-AI Service Portal as evidence. As you can see, /home/jovyan is the default home directory automatically assigned when a workspace is created on this server.
I attached the training code without modification because I wanted to transparently show exactly how the experiments were conducted. However, I realize now that I should have cleaned up these internal file paths before submission. I apologize for any confusion this may have caused—this is my first time collaborating with an industry partner, and I was not aware this could raise concerns.

iamseungpil force-pushed the feature/filter-unexpected-tool-calls branch from 7a5ee47 to 555d1fc Compare January 25, 2026 15:04

JiahangXu reviewed Jan 26, 2026

View reviewed changes

XufangLuo reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(verl): add unexpected tool call filtering #467

feat(verl): add unexpected tool call filtering #467

Uh oh!

iamseungpil commented Jan 25, 2026 •

edited

Loading

Uh oh!

iamseungpil commented Jan 25, 2026

Uh oh!

JiahangXu Jan 26, 2026

Uh oh!

XufangLuo Jan 27, 2026

Uh oh!

XufangLuo Jan 27, 2026

Uh oh!

XufangLuo Jan 27, 2026

Uh oh!

iamseungpil Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(verl): add unexpected tool call filtering #467

Are you sure you want to change the base?

feat(verl): add unexpected tool call filtering #467

Uh oh!

Conversation

iamseungpil commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Configuration

Filter OFF (baseline)

Filter ON

Uh oh!

iamseungpil commented Jan 25, 2026

Uh oh!

JiahangXu Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

XufangLuo Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

XufangLuo Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

XufangLuo Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

iamseungpil Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iamseungpil commented Jan 25, 2026 •

edited

Loading