Commit 4e342d5
authored
🤖 ci: make grep|head test assertion deterministic (#1261)
The test was flaky because it checked if the LLM response text contained
`terminal bench`. LLMs sometimes summarize command output instead of
quoting it verbatim.
Changed to verify the bash tool completed by checking for
`tool-call-end` events, which is deterministic and directly tests what
we care about (command completed without hanging).
---
_Generated with `mux` • Model: `anthropic:claude-opus-4-5` • Thinking:
`high`_1 parent 133adb8 commit 4e342d5
1 file changed
+9
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
387 | 387 | | |
388 | 388 | | |
389 | 389 | | |
390 | | - | |
391 | | - | |
392 | | - | |
393 | 390 | | |
394 | | - | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
395 | 400 | | |
396 | 401 | | |
397 | 402 | | |
| |||
0 commit comments