Added Qwen-2.5-vl Model support for 1-step VisionAgent and updated reproducibility results with MiniWob and WorkArena#246
Added Qwen-2.5-vl Model support for 1-step VisionAgent and updated reproducibility results with MiniWob and WorkArena#246amanjaiswal73892 wants to merge 5 commits intomainfrom
Conversation
There was a problem hiding this comment.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
| Category | Issue | Status |
|---|---|---|
| Duplicated Word in Documentation ▹ view | ||
| No Token Buffer Safety Margin ▹ view | ||
| Assert used for production validation ▹ view |
Files scanned
| File Path | Reviewed |
|---|---|
| src/agentlab/agents/init.py | ✅ |
| src/agentlab/agents/generic_agent/tmlr_config.py | ✅ |
| src/agentlab/agents/visual_agent/agent_configs.py | ✅ |
| src/agentlab/llm/llm_configs.py | ✅ |
| src/agentlab/agents/generic_agent/agent_configs.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
|
|
||
| - TapeAgent: An agent that uses the Tape data structure to perform actions | ||
|
|
||
| - VisualAgent: An agent that uses visual observations to to perform actions |
There was a problem hiding this comment.
Duplicated Word in Documentation 
Tell me more
What is the issue?
There is a duplicated 'to' in the description of VisualAgent.
Why this matters
The duplicate word makes the documentation incorrect and unprofessional, affecting the clarity of the API documentation for users.
Suggested change ∙ Feature Preview
Replace the line with a correctly formatted description:
- VisualAgent: An agent that uses visual observations to perform actionsProvide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| max_total_tokens=128_000, | ||
| max_input_tokens=120_000, | ||
| max_new_tokens=8_000, |
There was a problem hiding this comment.
No Token Buffer Safety Margin 
Tell me more
What is the issue?
The sum of max_input_tokens and max_new_tokens equals max_total_tokens exactly, which could lead to token limit errors at runtime.
Why this matters
When the actual input reaches near max_input_tokens, even slightly exceeding it due to tokenization differences could cause failures since there's no buffer in the total token limit.
Suggested change ∙ Feature Preview
Add a small buffer by reducing max_input_tokens or max_new_tokens to ensure total is less than max_total_tokens:
max_total_tokens=128_000,
max_input_tokens=119_000, # Reduced to provide buffer
max_new_tokens=8_000,Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
|
|
||
| def get_som_agent(llm_config: str): | ||
| """Creates basic 1-step vision SOM agent""" | ||
| assert llm_config in CHAT_MODEL_ARGS_DICT, f"Unsupported LLM config: {llm_config}" |
There was a problem hiding this comment.
Assert used for production validation 
Tell me more
What is the issue?
Using assert for input validation in a function that could be called from production code.
Why this matters
Assertions can be disabled with Python's -O flag in production, leaving the function vulnerable to invalid inputs without any error handling.
Suggested change ∙ Feature Preview
Replace assertion with explicit error handling:
if llm_config not in CHAT_MODEL_ARGS_DICT:
raise ValueError(f"Unsupported LLM config: {llm_config}")Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
New Agents and Models:
VisualAgentandGenericAgentconfigurations forqwen2.5-vl-32b-instructandqwen3-32bmodels insrc/agentlab/agents/generic_agent/agent_configs.pyandsrc/agentlab/agents/visual_agent/agent_configs.py.VISUAL_SOM_AGENT_LLAMA4_17B_INSTRUCTusing the SOM approach insrc/agentlab/agents/visual_agent/agent_configs.py.Configuration Enhancements:
src/agentlab/llm/llm_configs.pyto include new LLM configurations forqwen2.5-vl-32b-instructandqwen3-32b. These configurations specify token limits and vision support.get_base_agentinsrc/agentlab/agents/generic_agent/tmlr_config.pyfor creating base agent configurations with validation for supported LLM configurations.Codebase Improvements:
src/agentlab/agents/__init__.pyto include a description of the newVisualAgent. L1.Description by Korbit AI
What change is being made?
Add support for the Qwen-2.5-vl model in the 1-step VisionAgent, and update reproducibility results with new benchmarks for MiniWob and WorkArena.
Why are these changes being made?
This update integrates the Qwen-2.5-vl model to enhance the visual processing capabilities of the 1-step VisionAgent, enabling it to perform better in tasks requiring vision support. Additionally, including new benchmark results from MiniWob and WorkArena ensures the reproducibility of the updated agents, reflecting the latest performance metrics and configurations.