-
Notifications
You must be signed in to change notification settings - Fork 108
Added Qwen-2.5-vl Model support for 1-step VisionAgent and updated reproducibility results with MiniWob and WorkArena #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f7bf47d
bfc68db
9a84b80
5bd690e
16af31b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,3 +42,46 @@ | |
| chat_model_args=CHAT_MODEL_ARGS_DICT["openrouter/anthropic/claude-3.5-sonnet:beta"], | ||
| flags=DEFAULT_PROMPT_FLAGS, | ||
| ) | ||
|
|
||
| VISUAL_AGENT_QWEN_2_5_VL_32B = VisualAgentArgs( | ||
| chat_model_args=CHAT_MODEL_ARGS_DICT["openrouter/qwen/qwen2.5-vl-32b-instruct"], | ||
| flags=DEFAULT_PROMPT_FLAGS, | ||
| ) | ||
|
|
||
| def get_som_agent(llm_config: str): | ||
| """Creates basic 1-step vision SOM agent""" | ||
| assert llm_config in CHAT_MODEL_ARGS_DICT, f"Unsupported LLM config: {llm_config}" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assert used for production validation
Tell me moreWhat is the issue?Using assert for input validation in a function that could be called from production code. Why this mattersAssertions can be disabled with Python's -O flag in production, leaving the function vulnerable to invalid inputs without any error handling. Suggested change ∙ Feature PreviewReplace assertion with explicit error handling: if llm_config not in CHAT_MODEL_ARGS_DICT:
raise ValueError(f"Unsupported LLM config: {llm_config}")Provide feedback to improve future suggestions💬 Looking for more details? Reply to this comment to chat with Korbit. |
||
| obs_flags = dp.ObsFlags( | ||
| use_tabs=True, | ||
| use_error_logs=True, | ||
| use_past_error_logs=False, | ||
| use_screenshot=True, | ||
| use_som=True, | ||
| openai_vision_detail="auto", | ||
| ) | ||
| action_flags = dp.ActionFlags( | ||
| action_set=bgym.HighLevelActionSetArgs(subsets=["bid"]), | ||
| long_description=True, | ||
| individual_examples=False, | ||
| ) | ||
| som_prompt_flags = PromptFlags( | ||
| obs=obs_flags, | ||
| action=action_flags, | ||
| use_thinking=True, | ||
| use_concrete_example=False, | ||
| use_abstract_example=True, | ||
| enable_chat=False, | ||
| extra_instructions=None, | ||
| ) | ||
|
|
||
| agent_args = VisualAgentArgs( | ||
| chat_model_args=CHAT_MODEL_ARGS_DICT[llm_config], | ||
| flags=som_prompt_flags, | ||
| ) | ||
| model_name = agent_args.chat_model_args.model_name | ||
| agent_args.agent_name = f"VisualAgent-som-{model_name}".replace("/", "_") | ||
|
|
||
| return agent_args | ||
|
|
||
|
|
||
| VISUAL_SOM_AGENT_LLAMA4_17B_INSTRUCT = get_som_agent("openrouter/meta-llama/llama-4-maverick") | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -207,4 +207,20 @@ | |
| max_new_tokens=64_000, | ||
| temperature=1e-1, | ||
| ), | ||
| "openrouter/qwen/qwen2.5-vl-32b-instruct": OpenRouterModelArgs( | ||
| model_name="qwen/qwen2.5-vl-32b-instruct", | ||
| max_total_tokens=128_000, | ||
| max_input_tokens=120_000, | ||
| max_new_tokens=8_000, | ||
|
Comment on lines
+212
to
+214
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No Token Buffer Safety Margin
Tell me moreWhat is the issue?The sum of max_input_tokens and max_new_tokens equals max_total_tokens exactly, which could lead to token limit errors at runtime. Why this mattersWhen the actual input reaches near max_input_tokens, even slightly exceeding it due to tokenization differences could cause failures since there's no buffer in the total token limit. Suggested change ∙ Feature PreviewAdd a small buffer by reducing max_input_tokens or max_new_tokens to ensure total is less than max_total_tokens: max_total_tokens=128_000,
max_input_tokens=119_000, # Reduced to provide buffer
max_new_tokens=8_000,Provide feedback to improve future suggestions💬 Looking for more details? Reply to this comment to chat with Korbit. |
||
| temperature=1e-1, | ||
| vision_support=True, | ||
| ), | ||
| "openrouter/qwen/qwen3-32b": OpenRouterModelArgs( | ||
| model_name="qwen/qwen3-32b", | ||
| max_total_tokens=128_000, | ||
| max_input_tokens=120_000, | ||
| max_new_tokens=8_000, | ||
| temperature=1e-1, | ||
| vision_support=True, | ||
| ), | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated Word in Documentation
Tell me more
What is the issue?
There is a duplicated 'to' in the description of VisualAgent.
Why this matters
The duplicate word makes the documentation incorrect and unprofessional, affecting the clarity of the API documentation for users.
Suggested change ∙ Feature Preview
Replace the line with a correctly formatted description:
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.