Skip to content

Commit 4e342d5

Browse files
authored
🤖 ci: make grep|head test assertion deterministic (#1261)
The test was flaky because it checked if the LLM response text contained `terminal bench`. LLMs sometimes summarize command output instead of quoting it verbatim. Changed to verify the bash tool completed by checking for `tool-call-end` events, which is deterministic and directly tests what we care about (command completed without hanging). --- _Generated with `mux` • Model: `anthropic:claude-opus-4-5` • Thinking: `high`_
1 parent 133adb8 commit 4e342d5

File tree

1 file changed

+9
-4
lines changed

1 file changed

+9
-4
lines changed

tests/ipc/runtimeExecuteBash.test.ts

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -387,11 +387,16 @@ describeIntegration("Runtime Bash Execution", () => {
387387
// Calculate actual tool execution duration
388388
const toolDuration = getToolDuration(events, "bash");
389389

390-
// Extract response text
391-
const responseText = extractTextFromEvents(events);
392-
393390
// Verify command completed successfully (not timeout)
394-
expect(responseText).toContain("terminal bench");
391+
// Check that the bash tool completed (tool-call-end events exist)
392+
const toolCallEnds = events.filter(
393+
(e) =>
394+
"type" in e &&
395+
e.type === "tool-call-end" &&
396+
"toolName" in e &&
397+
e.toolName === "bash"
398+
);
399+
expect(toolCallEnds.length).toBeGreaterThan(0);
395400

396401
// Verify command completed quickly (not hanging until timeout)
397402
// SSH runtime should complete in <10s even with high latency

0 commit comments

Comments
 (0)