Added Qwen-2.5-vl Model support for 1-step VisionAgent and updated reproducibility results with MiniWob and WorkArena by amanjaiswal73892 · Pull Request #246 · ServiceNow/AgentLab

amanjaiswal73892 · 2025-05-15T21:39:28Z

New Agents and Models:

Added VisualAgent and GenericAgent configurations for qwen2.5-vl-32b-instruct and qwen3-32b models in src/agentlab/agents/generic_agent/agent_configs.py and src/agentlab/agents/visual_agent/agent_configs.py.
Introduced VISUAL_SOM_AGENT_LLAMA4_17B_INSTRUCT using the SOM approach in src/agentlab/agents/visual_agent/agent_configs.py.

Configuration Enhancements:

Updated src/agentlab/llm/llm_configs.py to include new LLM configurations for qwen2.5-vl-32b-instruct and qwen3-32b. These configurations specify token limits and vision support.
Added a utility function get_base_agent in src/agentlab/agents/generic_agent/tmlr_config.py for creating base agent configurations with validation for supported LLM configurations.

Codebase Improvements:

Expanded the agent documentation in src/agentlab/agents/__init__.py to include a description of the new VisualAgent. L1.

Description by Korbit AI

What change is being made?

Add support for the Qwen-2.5-vl model in the 1-step VisionAgent, and update reproducibility results with new benchmarks for MiniWob and WorkArena.

Why are these changes being made?

This update integrates the Qwen-2.5-vl model to enhance the visual processing capabilities of the 1-step VisionAgent, enabling it to perform better in tasks requiring vision support. Additionally, including new benchmark results from MiniWob and WorkArena ensures the reproducibility of the updated agents, reflecting the latest performance metrics and configurations.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Duplicated Word in Documentation ▹ view
	No Token Buffer Safety Margin ▹ view
	Assert used for production validation ▹ view

Files scanned

File Path	Reviewed
src/agentlab/agents/init.py	✅
src/agentlab/agents/generic_agent/tmlr_config.py	✅
src/agentlab/agents/visual_agent/agent_configs.py	✅
src/agentlab/llm/llm_configs.py	✅
src/agentlab/agents/generic_agent/agent_configs.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai · 2025-05-15T21:41:36Z

src/agentlab/agents/__init__.py


 - TapeAgent: An agent that uses the Tape data structure to perform actions

+- VisualAgent: An agent that uses visual observations to to perform actions


Duplicated Word in Documentation

Tell me more

What is the issue?

There is a duplicated 'to' in the description of VisualAgent.

Why this matters

The duplicate word makes the documentation incorrect and unprofessional, affecting the clarity of the API documentation for users.

Suggested change ∙ Feature Preview

Replace the line with a correctly formatted description:

- VisualAgent: An agent that uses visual observations to perform actions

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-05-15T21:41:37Z

src/agentlab/llm/llm_configs.py

+        max_total_tokens=128_000,
+        max_input_tokens=120_000,
+        max_new_tokens=8_000,


No Token Buffer Safety Margin

Tell me more

What is the issue?

The sum of max_input_tokens and max_new_tokens equals max_total_tokens exactly, which could lead to token limit errors at runtime.

Why this matters

When the actual input reaches near max_input_tokens, even slightly exceeding it due to tokenization differences could cause failures since there's no buffer in the total token limit.

Suggested change ∙ Feature Preview

Add a small buffer by reducing max_input_tokens or max_new_tokens to ensure total is less than max_total_tokens:

max_total_tokens=128_000, max_input_tokens=119_000, # Reduced to provide buffer max_new_tokens=8_000,

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-05-15T21:41:37Z

src/agentlab/agents/visual_agent/agent_configs.py

+
+def get_som_agent(llm_config: str):
+    """Creates basic 1-step vision SOM agent"""
+    assert llm_config in CHAT_MODEL_ARGS_DICT, f"Unsupported LLM config: {llm_config}"


Assert used for production validation

Tell me more

What is the issue?

Using assert for input validation in a function that could be called from production code.

Why this matters

Assertions can be disabled with Python's -O flag in production, leaving the function vulnerable to invalid inputs without any error handling.

Suggested change ∙ Feature Preview

Replace assertion with explicit error handling:

if llm_config not in CHAT_MODEL_ARGS_DICT: raise ValueError(f"Unsupported LLM config: {llm_config}")

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

amanjaiswal73892 and others added 4 commits May 8, 2025 12:18

add-qwen-models

f7bf47d

Fix the csv by removing " on some lines

bfc68db

added miniwob-results

9a84b80

Update reproducibility journal with new agent results and fix formatting

5bd690e

amanjaiswal73892 requested a review from recursix May 15, 2025 21:39

korbit-ai bot reviewed May 15, 2025

View reviewed changes

Enhance ChatModel to support reasoning extraction in responses

16af31b

amanjaiswal73892 closed this May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Added Qwen-2.5-vl Model support for 1-step VisionAgent and updated reproducibility results with MiniWob and WorkArena#246

Added Qwen-2.5-vl Model support for 1-step VisionAgent and updated reproducibility results with MiniWob and WorkArena#246
amanjaiswal73892 wants to merge 5 commits intomainfrom
aj/more-results

amanjaiswal73892 commented May 15, 2025 •

edited by korbit-ai bot

Loading

Uh oh!

korbit-ai bot left a comment •

edited

Loading

Uh oh!

korbit-ai bot May 15, 2025

Uh oh!

korbit-ai bot May 15, 2025

Uh oh!

korbit-ai bot May 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		- TapeAgent: An agent that uses the Tape data structure to perform actions

		- VisualAgent: An agent that uses visual observations to to perform actions

Comments

Conversation

amanjaiswal73892 commented May 15, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Agents and Models:

Configuration Enhancements:

Codebase Improvements:

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

korbit-ai bot May 15, 2025

Choose a reason for hiding this comment

Duplicated Word in Documentation

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot May 15, 2025

Choose a reason for hiding this comment

No Token Buffer Safety Margin

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot May 15, 2025

Choose a reason for hiding this comment

Assert used for production validation

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

amanjaiswal73892 commented May 15, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot left a comment •

edited

Loading