Skip to content

Comments

Fm/debug examples#916

Open
filip-michalsky wants to merge 8 commits intoPrimeIntellect-ai:mainfrom
filip-michalsky:fm/debug-examples
Open

Fm/debug examples#916
filip-michalsky wants to merge 8 commits intoPrimeIntellect-ai:mainfrom
filip-michalsky:fm/debug-examples

Conversation

@filip-michalsky
Copy link
Contributor

@filip-michalsky filip-michalsky commented Feb 15, 2026

Description

Two fixes for the BrowserEnv CUA mode and a DX improvement for the DOM example:

  1. Fix screenshot passing in CUA messages: Screenshots attached as image_url parts inside tool messages were being dropped by clients. The new env_response override in
    BrowserEnv relocates image_url parts out of tool messages into a trailing UserMessage so they are preserved. Also normalizes empty tool-call arguments to "{}" to prevent
    json.loads("") decode failures on zero-arg tools (e.g. screenshot).

  2. Make filter_screenshots_in_messages robust to mixed content types: Content parts can be plain dicts or Pydantic objects. Updated type checks in CUAMode to handle both
    via _get_item_type() helper and getattr fallback.

  3. DX improvement for DOM example: Made project_id optional in load_environment() — it now falls back to the BROWSERBASE_PROJECT_ID env var at runtime. Removed the
    upfront env-var validation that eagerly errored before the environment was even used.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

Files changed:

  • environments/browser_dom_example/browser_dom_example.py — make project_id optional, remove eager env-var check
  • verifiers/envs/integrations/browser_env/browser_env.py — new env_response override: sanitize empty tool args + relocate screenshots from tool messages to user messages
  • verifiers/envs/integrations/browser_env/modes/cua_mode.py — handle dict and Pydantic content parts in filter_screenshots_in_messages

Note

Medium Risk
Touches browser/sandbox session lifecycle and retry behavior and changes how screenshots/messages are post-processed, which could affect CUA stability and provider-facing error handling under load.

Overview
Improves BrowserEnv CUA message handling by normalizing empty zero-arg tool-call arguments to {} and relocating image_url parts found in tool messages into a trailing user message so screenshots aren’t dropped by downstream adapters.

Hardens CUA session creation with a new CUASessionCreateError, dedicated retry/backoff controls for session creation (separate from other operations), richer error parsing (HTTP status + JSON body), and best-effort cleanup of partially created sandboxes/sessions on setup failure.

Updates the CUA Fastify template server to apply session-create backpressure (configurable concurrent/queued limits), classify provider failures into structured SessionCreateError responses (retryable, statusCode), and documents the new server env vars; also makes the DOM example’s project_id optional and adds wandb to project dependencies with expanded tests for the new behaviors.

Written by Cursor Bugbot for commit a62cc90. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

"gepa",
"pyzmq>=27.1.0",
"msgpack>=1.1.2",
"wandb>=0.25.0",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated wandb dependency added to core package

High Severity

wandb>=0.25.0 is added to the core verifiers package dependencies, but wandb is not imported or used anywhere in the verifiers/ source tree. It is only used in the separate packages/verifiers-rl/ package, which already declares its own wandb dependency. This change is unrelated to the PR description and adds a heavyweight transitive dependency (~hundreds of MB) for all users of the core package.

Fix in Cursor Fix in Web

content = _get_message_content(msg)
if isinstance(content, list):
for content_idx, item in enumerate(content):
if isinstance(item, dict) and item.get("type") == "image_url":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curl path missing retryable=False for parse errors

Medium Severity

CUASessionCreateError raised in _create_session_curl for a JSON decode error on a successful (HTTP 200) response omits both retryable and status_code, defaulting to retryable=True. The equivalent HTTP path in _create_session_http correctly passes retryable=False and status_code=resp.status. This causes the curl path to wastefully retry on a permanently malformed response.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant