Fm/debug examples by filip-michalsky · Pull Request #916 · PrimeIntellect-ai/verifiers

filip-michalsky · 2026-02-15T17:51:56Z

Description

Two fixes for the BrowserEnv CUA mode and a DX improvement for the DOM example:

Fix screenshot passing in CUA messages: Screenshots attached as image_url parts inside tool messages were being dropped by clients. The new env_response override in
BrowserEnv relocates image_url parts out of tool messages into a trailing UserMessage so they are preserved. Also normalizes empty tool-call arguments to "{}" to prevent
json.loads("") decode failures on zero-arg tools (e.g. screenshot).
Make filter_screenshots_in_messages robust to mixed content types: Content parts can be plain dicts or Pydantic objects. Updated type checks in CUAMode to handle both
via _get_item_type() helper and getattr fallback.
DX improvement for DOM example: Made project_id optional in load_environment() — it now falls back to the BROWSERBASE_PROJECT_ID env var at runtime. Removed the
upfront env-var validation that eagerly errored before the environment was even used.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Files changed:

environments/browser_dom_example/browser_dom_example.py — make project_id optional, remove eager env-var check
verifiers/envs/integrations/browser_env/browser_env.py — new env_response override: sanitize empty tool args + relocate screenshots from tool messages to user messages
verifiers/envs/integrations/browser_env/modes/cua_mode.py — handle dict and Pydantic content parts in filter_screenshots_in_messages

Note

Medium Risk
Touches browser/sandbox session lifecycle and retry behavior and changes how screenshots/messages are post-processed, which could affect CUA stability and provider-facing error handling under load.

Overview
Improves BrowserEnv CUA message handling by normalizing empty zero-arg tool-call arguments to {} and relocating image_url parts found in tool messages into a trailing user message so screenshots aren’t dropped by downstream adapters.

Hardens CUA session creation with a new CUASessionCreateError, dedicated retry/backoff controls for session creation (separate from other operations), richer error parsing (HTTP status + JSON body), and best-effort cleanup of partially created sandboxes/sessions on setup failure.

Updates the CUA Fastify template server to apply session-create backpressure (configurable concurrent/queued limits), classify provider failures into structured SessionCreateError responses (retryable, statusCode), and documents the new server env vars; also makes the DOM example’s project_id optional and adds wandb to project dependencies with expanded tests for the new behaviors.

^{Written by Cursor Bugbot for commit a62cc90. This will update automatically on new commits. Configure here.}

…example

verifiers/envs/integrations/browser_env/modes/cua_mode.py

verifiers/envs/integrations/browser_env/browser_env.py

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-20T20:36:54Z

pyproject.toml

    "gepa",
    "pyzmq>=27.1.0",
    "msgpack>=1.1.2",
+    "wandb>=0.25.0",


Unrelated wandb dependency added to core package

High Severity

wandb>=0.25.0 is added to the core verifiers package dependencies, but wandb is not imported or used anywhere in the verifiers/ source tree. It is only used in the separate packages/verifiers-rl/ package, which already declares its own wandb dependency. This change is unrelated to the PR description and adds a heavyweight transitive dependency (~hundreds of MB) for all users of the core package.

cursor · 2026-02-20T20:36:54Z

verifiers/envs/integrations/browser_env/modes/cua_mode.py

+            content = _get_message_content(msg)
            if isinstance(content, list):
                for content_idx, item in enumerate(content):
-                    if isinstance(item, dict) and item.get("type") == "image_url":


Curl path missing retryable=False for parse errors

Medium Severity

CUASessionCreateError raised in _create_session_curl for a JSON decode error on a successful (HTTP 200) response omits both retryable and status_code, defaulting to retryable=True. The equivalent HTTP path in _create_session_http correctly passes retryable=False and status_code=resp.status. This causes the curl path to wastefully retry on a permanently malformed response.

filip-michalsky added 2 commits February 15, 2026 11:18

fall back to bb project id in env - devX improvement for running DOM …

a714cff

…example

ruff

7d80b80

cursor bot reviewed Feb 15, 2026

View reviewed changes

verifiers/envs/integrations/browser_env/modes/cua_mode.py Show resolved Hide resolved

verifiers/envs/integrations/browser_env/browser_env.py Outdated Show resolved Hide resolved

filip-michalsky added 6 commits February 15, 2026 19:01

simplify fix

6da43f3

fix ty

4b33baf

ruff

fa40f0b

reduce OOM issues

395c79b

fix memory leak

29f99e5

ty fix

a62cc90

cursor bot reviewed Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fm/debug examples#916

Fm/debug examples#916
filip-michalsky wants to merge 8 commits intoPrimeIntellect-ai:mainfrom
filip-michalsky:fm/debug-examples

filip-michalsky commented Feb 15, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 20, 2026

Uh oh!

cursor bot Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

filip-michalsky commented Feb 15, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 20, 2026

Choose a reason for hiding this comment

Unrelated wandb dependency added to core package

Uh oh!

cursor bot Feb 20, 2026

Choose a reason for hiding this comment

Curl path missing retryable=False for parse errors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

filip-michalsky commented Feb 15, 2026 •

edited by cursor bot

Loading