Nightly Terminal-Bench #8
nightly-terminal-bench.yml
on: schedule
Determine models to test
3s
Matrix: benchmark
Annotations
4 errors
|
anthropic:claude-sonnet-4-5 / Run Terminal-Bench (anthropic:claude-sonnet-4-5)
The job has exceeded the maximum execution time of 3h0m0s
|
|
anthropic:claude-sonnet-4-5 / Run Terminal-Bench (anthropic:claude-sonnet-4-5)
The operation was canceled.
|
|
openai:gpt-5-codex / Run Terminal-Bench (openai:gpt-5-codex)
The job has exceeded the maximum execution time of 3h0m0s
|
|
openai:gpt-5-codex / Run Terminal-Bench (openai:gpt-5-codex)
The operation was canceled.
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
terminal-bench-results-anthropic-claude-sonnet-4-5-19020121246
Expired
|
11 MB |
sha256:ff885a97e88b7928a86e7a756468f4f22a1de70e41b46aec06b619b9d1ea2afc
|
|
|
terminal-bench-results-openai-gpt-5-codex-19020121246
Expired
|
7.35 MB |
sha256:7d364db408a213e6ea0c8b1084549d8bb2cb90d470e2dc41da69cc0e0ce81d1e
|
|