Skip to content

Conversation

@iamseungpil
Copy link

@iamseungpil iamseungpil commented Jan 25, 2026

Summary

  • Add filtering for turns with malformed tool call endings (missing </tool_call><|im_end|>)
  • Track training/unexpected_tool_call_ratio metric for monitoring
  • Fix missing gts parameter in validation data dump

Configuration

YAML (agentlightning/verl/config.yaml):

agentlightning:                                                                                                     
  trace_aggregator:                                                                                                 
    filter_unexpected_tool_calls: true  # default: false

CLI:
python examples/calc_x/train_calc_agent.py --filter-unexpected-tool-calls

Verification
cd examples/calc_x

Filter OFF (baseline)

python train_calc_agent.py \                                                                                        
    --train-file data/train.parquet \                                                                               
    --val-file data/test.parquet \                                                                                  
    --experiment-name grpo_baseline

Filter ON

python train_calc_agent.py \                                                                                        
      --train-file data/train.parquet \                                                                               
      --val-file data/test.parquet \                                                                                  
      --filter-unexpected-tool-calls \                                                                                
      --experiment-name grpo_filter_on```

@iamseungpil
Copy link
Author

@microsoft-github-policy-service agree company="Gwangju Institute of Science and Technology"

Add filtering for "unexpected tool call" turns where the model continues
generating after a tool call instead of stopping at </tool_call><|im_end|>.
This helps prevent entropy explosion during GRPO training.

Changes:
- daemon.py: Add _setup_tool_call_filter(), _count_invalid_turns(),
  _filter_invalid_turns(), and void turn filtering
- config.yaml: Add filter_unexpected_tool_calls option (default: False)
- trainer.py: Fix missing gts parameter in _dump_generations()
- examples/calc_x/train_calc_agent.py: Add --filter-unexpected-tool-calls CLI flag

Key improvements over Youtu branch:
- Uses apply_chat_template() for model-agnostic token detection
- Supports multiple valid endings (eos_token, pad_token variants)
- Uses calculator tool example for calc-x consistency

Reference: contrib/youtu-agent-lightning branch
@iamseungpil iamseungpil force-pushed the feature/filter-unexpected-tool-calls branch from 7a5ee47 to 555d1fc Compare January 25, 2026 15:04
and self.trace_aggregator.get("debug", False)
else {}
),
"training/n_unexpected_tool_calls": n_unexpected_tool_calls,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment: only set the logging metrics visible when self.tool_parser is not None.

import agentlightning as agl
from agentlightning.env_var import LightningEnvVar, resolve_bool_env_var, resolve_str_env_var

# Ensure venv bin is in PATH (needed for uvx/mcp-server-calculator in Ray workers)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some unnecessary changes to this file. Only related config should be included here I think.

filter_unexpected_tool_calls: bool = False,
experiment_name: Optional[str] = None,
n_gpus: int = 1,
checkpoint_dir: str = "/home/jovyan/msra/experiments/checkpoints",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain about this line? It seems that this path belongs to someone else?

"--checkpoint-dir",
type=str,
default="/home/jovyan/msra/experiments/checkpoints",
help="Directory to save checkpoints (default: /home/jovyan/msra/experiments/checkpoints)",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your careful review and for raising this question.

To clarify, /home/jovyan is not a specific person's directory—it is the default home directory name on the OpenHPC server provided by my university (GIST). The msra folder is my personal working directory that I created specifically for this project, which is also linked to my GitHub repository.

I have attached screenshots of my university's HPC-AI Service Portal as evidence. As you can see, /home/jovyan is the default home directory automatically assigned when a workspace is created on this server.
I attached the training code without modification because I wanted to transparently show exactly how the experiments were conducted. However, I realize now that I should have cleaned up these internal file paths before submission. I apologize for any confusion this may have caused—this is my first time collaborating with an industry partner, and I was not aware this could raise concerns.

스크린샷 2026-01-27 오후 5 16 46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants