Skip to content

Nightly Terminal-Bench #8

Nightly Terminal-Bench

Nightly Terminal-Bench #8

Triggered via schedule November 3, 2025 00:04
Status Cancelled
Total duration 3h 0m 25s
Artifacts 2
Determine models to test
3s
Determine models to test
Matrix: benchmark
Fit to window
Zoom out
Zoom in

Annotations

4 errors
anthropic:claude-sonnet-4-5 / Run Terminal-Bench (anthropic:claude-sonnet-4-5)
The job has exceeded the maximum execution time of 3h0m0s
openai:gpt-5-codex / Run Terminal-Bench (openai:gpt-5-codex)
The job has exceeded the maximum execution time of 3h0m0s
openai:gpt-5-codex / Run Terminal-Bench (openai:gpt-5-codex)
The operation was canceled.

Artifacts

Produced during runtime
Name Size Digest
terminal-bench-results-anthropic-claude-sonnet-4-5-19020121246 Expired
11 MB
sha256:ff885a97e88b7928a86e7a756468f4f22a1de70e41b46aec06b619b9d1ea2afc
terminal-bench-results-openai-gpt-5-codex-19020121246 Expired
7.35 MB
sha256:7d364db408a213e6ea0c8b1084549d8bb2cb90d470e2dc41da69cc0e0ce81d1e