Skip to content

Comments

Added Qwen-2.5-vl Model support for 1-step VisionAgent and updated reproducibility results with MiniWob and WorkArena#246

Closed
amanjaiswal73892 wants to merge 5 commits intomainfrom
aj/more-results
Closed

Added Qwen-2.5-vl Model support for 1-step VisionAgent and updated reproducibility results with MiniWob and WorkArena#246
amanjaiswal73892 wants to merge 5 commits intomainfrom
aj/more-results

Conversation

@amanjaiswal73892
Copy link
Collaborator

@amanjaiswal73892 amanjaiswal73892 commented May 15, 2025

New Agents and Models:

  • Added VisualAgent and GenericAgent configurations for qwen2.5-vl-32b-instruct and qwen3-32b models in src/agentlab/agents/generic_agent/agent_configs.py and src/agentlab/agents/visual_agent/agent_configs.py.
  • Introduced VISUAL_SOM_AGENT_LLAMA4_17B_INSTRUCT using the SOM approach in src/agentlab/agents/visual_agent/agent_configs.py.

Configuration Enhancements:

  • Updated src/agentlab/llm/llm_configs.py to include new LLM configurations for qwen2.5-vl-32b-instruct and qwen3-32b. These configurations specify token limits and vision support.
  • Added a utility function get_base_agent in src/agentlab/agents/generic_agent/tmlr_config.py for creating base agent configurations with validation for supported LLM configurations.

Codebase Improvements:

  • Expanded the agent documentation in src/agentlab/agents/__init__.py to include a description of the new VisualAgent. L1.

Description by Korbit AI

What change is being made?

Add support for the Qwen-2.5-vl model in the 1-step VisionAgent, and update reproducibility results with new benchmarks for MiniWob and WorkArena.

Why are these changes being made?

This update integrates the Qwen-2.5-vl model to enhance the visual processing capabilities of the 1-step VisionAgent, enabling it to perform better in tasks requiring vision support. Additionally, including new benchmark results from MiniWob and WorkArena ensures the reproducibility of the updated agents, reflecting the latest performance metrics and configurations.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

@amanjaiswal73892 amanjaiswal73892 requested a review from recursix May 15, 2025 21:39
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Documentation Duplicated Word in Documentation ▹ view
Performance No Token Buffer Safety Margin ▹ view
Error Handling Assert used for production validation ▹ view
Files scanned
File Path Reviewed
src/agentlab/agents/init.py
src/agentlab/agents/generic_agent/tmlr_config.py
src/agentlab/agents/visual_agent/agent_configs.py
src/agentlab/llm/llm_configs.py
src/agentlab/agents/generic_agent/agent_configs.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X


- TapeAgent: An agent that uses the Tape data structure to perform actions

- VisualAgent: An agent that uses visual observations to to perform actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated Word in Documentation category Documentation

Tell me more
What is the issue?

There is a duplicated 'to' in the description of VisualAgent.

Why this matters

The duplicate word makes the documentation incorrect and unprofessional, affecting the clarity of the API documentation for users.

Suggested change ∙ Feature Preview

Replace the line with a correctly formatted description:

- VisualAgent: An agent that uses visual observations to perform actions
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Comment on lines +212 to +214
max_total_tokens=128_000,
max_input_tokens=120_000,
max_new_tokens=8_000,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Token Buffer Safety Margin category Performance

Tell me more
What is the issue?

The sum of max_input_tokens and max_new_tokens equals max_total_tokens exactly, which could lead to token limit errors at runtime.

Why this matters

When the actual input reaches near max_input_tokens, even slightly exceeding it due to tokenization differences could cause failures since there's no buffer in the total token limit.

Suggested change ∙ Feature Preview

Add a small buffer by reducing max_input_tokens or max_new_tokens to ensure total is less than max_total_tokens:

max_total_tokens=128_000,
max_input_tokens=119_000,  # Reduced to provide buffer
max_new_tokens=8_000,
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.


def get_som_agent(llm_config: str):
"""Creates basic 1-step vision SOM agent"""
assert llm_config in CHAT_MODEL_ARGS_DICT, f"Unsupported LLM config: {llm_config}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert used for production validation category Error Handling

Tell me more
What is the issue?

Using assert for input validation in a function that could be called from production code.

Why this matters

Assertions can be disabled with Python's -O flag in production, leaving the function vulnerable to invalid inputs without any error handling.

Suggested change ∙ Feature Preview

Replace assertion with explicit error handling:

if llm_config not in CHAT_MODEL_ARGS_DICT:
    raise ValueError(f"Unsupported LLM config: {llm_config}")
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant