Commit 9aee49b
authored
🤖 ci: deflake MCP screenshot integration test (#1173)
Deflake MCP Chrome screenshot integration test.
Changes:
- Stop leaking MCP/Chrome processes by ensuring `setupWorkspace()` tears
down the workspace via `workspace.remove(...)` and disposes services
during test cleanup.
- Make the screenshot assertions deterministic by forcing a
`chrome_take_screenshot` tool call via `toolPolicy: require` (no longer
depends on the model deciding to use tools).
- Reduce CI variance by pinning `chrome-devtools-mcp` and using a fixed
viewport; run PNG/JPEG cases sequentially; increase the
post-`stream-end` tool-call wait.
Validation:
- `make static-check`
---
<details>
<summary>📋 Implementation Plan</summary>
# Deflake: `tests/ipc/mcpConfig.test.ts` MCP screenshot test
## What’s failing
- **Test:** `MCP server integration with model › MCP PNG image content
is correctly transformed to AI SDK format`
- **Failure:** `waitForToolCallEnd(…, "chrome_take_screenshot", …)`
returns `undefined` → no matching `tool-call-end` event.
## Likely root causes (ranked)
1. **Model nondeterminism:** the prompt uses a well-known page
(`example.com`), so the model can sometimes answer from prior knowledge
and skip `chrome_take_screenshot` entirely.
2. **Leaked MCP server processes between tests:** `setupWorkspace()` (in
`tests/ipc/setup.ts`) never calls `workspace.remove`, so MCP servers
started during a test can keep running. That matches the suite’s “Force
exiting Jest…” warning and can cause resource contention / sporadic MCP
startup failures.
3. **Timing/resource contention:** the test runs PNG+JPEG cases
concurrently and starts headless Chrome via `npx`; on slower CI hosts,
tool execution and event delivery may exceed the current 20s polling
window.
<details>
<summary>🔎 Evidence in repo</summary>
- `tests/ipc/setup.ts::setupWorkspace()` cleanup only deletes temp dirs;
it does **not** call `env.orpc.workspace.remove({ workspaceId })`.
- `WorkspaceService.remove()` explicitly stops MCP servers via
`mcpServerManager.stopServers(workspaceId)`.
- `mcpConfig.test.ts` depends on the model choosing to call
`chrome_take_screenshot` (not enforced).
</details>
---
## Recommended approach (A): Keep the integration test, but make it
deterministic
**Net new product LoC:** ~0 (test/harness only)
### A1) Fix cleanup so MCP servers don’t leak across tests
1. Update `tests/ipc/setup.ts::setupWorkspace()`’s `cleanup()` to:
- `await env.orpc.workspace.remove({ workspaceId }).catch(() => {})`
(must run **before** deleting `env.tempDir`)
- `await env.services.dispose()` (clears MCP idle interval + terminates
background procs)
- then run existing `cleanupTestEnvironment(env)` +
`cleanupTempGitRepo(tempGitRepo)`
This should eliminate orphaned Chrome/MCP processes and reduce CI flake
across the whole integration suite.
### A2) Stop relying on the model “choosing” to call screenshot tools
Modify `mcpConfig.test.ts` so the test asserts the transformation path
without depending on free-form model behavior.
Concrete options (pick one):
**Option 1 (preferred): force the tool call using `toolPolicy: require`
and don’t assert the description**
- Send a minimal prompt like:
- PNG: “Call `chrome_take_screenshot` now.”
- JPEG: “Call `chrome_take_screenshot` with format \"jpeg\".”
- Pass `options.toolPolicy = [{ regex_match: "chrome_take_screenshot",
action: "require" }]`.
- Only assert:
- a `tool-call-end` event exists for `chrome_take_screenshot`
- `assertValidScreenshotResult(…, mediaTypePattern)` passes
- Drop (or relax) `assertModelDescribesScreenshot()`; it adds LLM-output
flake and isn’t needed to validate the MCP→AI-SDK media transformation.
**Option 2: split into two required calls (navigate then screenshot)**
- Message 1: require `chrome_navigate_page` and instruct URL.
- Message 2: require `chrome_take_screenshot`.
- This is useful only if we still want to validate “example.com”
specifically; otherwise it’s extra moving parts.
### A3) Reduce environment-driven variance
- Pin the MCP server version: replace `chrome-devtools-mcp@latest` with
the currently observed version (`chrome-devtools-mcp@0.12.1`).
- Add a deterministic viewport (smaller = faster + avoids huge PNGs):
`--viewport 1280x720`.
- If CI still flakes, run PNG/JPEG sequentially (remove
`test.concurrent.each`).
- Increase `waitForToolCallEnd` timeout from 20s → 60s (CI headless
Chrome can be slow).
---
## Alternative approach (B): Move correctness to unit tests; keep only a
small integration smoke test
**Net new product LoC:** ~0
1. Add a unit test suite for `src/node/services/mcpResultTransform.ts`:
- converts MCP `{ content: [{type:"image", data, mimeType}] }` → `{
type:"content", value:[{type:"media", …}] }`
- preserves `mimeType` → `mediaType`
- validates the size guard behavior (`MAX_IMAGE_DATA_BYTES`)
deterministically
2. Replace the flaky Chrome+model integration assertion with:
- existing `memory_create_entities` MCP integration test (already
present)
- optional: a chrome MCP “tools available” test (no screenshot, no
model)
Use this if we decide that a full Chrome+LLM flow is too expensive/flaky
for CI.
---
## Optional product hardening (nice-to-have)
**Net new product LoC:** ~20–60
- Consider making `MCPServerManager.dispose()` stop all running
workspace servers (not just clear the idle interval). This would harden
app shutdown behavior and prevent long-lived processes in any non-test
embedding.
---
## Validation
- Run the failing test in a loop (CI-like):
- `TEST_INTEGRATION=1 bun x jest tests/ipc/mcpConfig.test.ts -t "image
content" --runInBand`
- Confirm:
- `tool-call-end` for `chrome_take_screenshot` is always present
- no lingering Node handles (the “Force exiting Jest…” warning
disappears or is reduced)
</details>
---
_Generated with `mux` • Model: `openai:gpt-5.2` • Thinking: `xhigh`_
---------
Signed-off-by: Thomas Kosiewski <tk@coder.com>1 parent 42385f0 commit 9aee49b
File tree
6 files changed
+374
-42
lines changed- src/node/services
- tests/ipc
- fixtures
6 files changed
+374
-42
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
453 | 453 | | |
454 | 454 | | |
455 | 455 | | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
456 | 462 | | |
457 | 463 | | |
458 | 464 | | |
459 | 465 | | |
460 | 466 | | |
461 | 467 | | |
462 | 468 | | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
463 | 473 | | |
464 | 474 | | |
465 | 475 | | |
| |||
501 | 511 | | |
502 | 512 | | |
503 | 513 | | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
504 | 518 | | |
505 | 519 | | |
506 | 520 | | |
| |||
550 | 564 | | |
551 | 565 | | |
552 | 566 | | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
553 | 571 | | |
554 | 572 | | |
555 | 573 | | |
| |||
716 | 734 | | |
717 | 735 | | |
718 | 736 | | |
719 | | - | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
720 | 743 | | |
721 | 744 | | |
722 | 745 | | |
| |||
775 | 798 | | |
776 | 799 | | |
777 | 800 | | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
778 | 808 | | |
779 | 809 | | |
780 | 810 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
622 | 622 | | |
623 | 623 | | |
624 | 624 | | |
625 | | - | |
| 625 | + | |
| 626 | + | |
626 | 627 | | |
627 | | - | |
628 | | - | |
| 628 | + | |
| 629 | + | |
629 | 630 | | |
630 | | - | |
631 | 631 | | |
632 | 632 | | |
633 | 633 | | |
634 | | - | |
635 | | - | |
| 634 | + | |
| 635 | + | |
636 | 636 | | |
637 | 637 | | |
638 | 638 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
0 commit comments