Nightly Terminal-Bench

Nightly Terminal-Bench #8

Sign in to view logs

Triggered via schedule November 3, 2025 00:04

github-merge-queue[bot]

main

Status Cancelled

Total duration 3h 0m 25s

Artifacts 2

nightly-terminal-bench.yml

on: schedule

Determine models to test

Matrix: benchmark

Annotations

4 errors

anthropic:claude-sonnet-4-5 / Run Terminal-Bench (anthropic:claude-sonnet-4-5)

The job has exceeded the maximum execution time of 3h0m0s

anthropic:claude-sonnet-4-5 / Run Terminal-Bench (anthropic:claude-sonnet-4-5)

The operation was canceled.

openai:gpt-5-codex / Run Terminal-Bench (openai:gpt-5-codex)

The job has exceeded the maximum execution time of 3h0m0s

openai:gpt-5-codex / Run Terminal-Bench (openai:gpt-5-codex)

The operation was canceled.

Artifacts

Produced during runtime

Name	Size	Digest
terminal-bench-results-anthropic-claude-sonnet-4-5-19020121246 Expired	11 MB	`sha256:ff885a97e88b7928a86e7a756468f4f22a1de70e41b46aec06b619b9d1ea2afc`
terminal-bench-results-openai-gpt-5-codex-19020121246 Expired	7.35 MB	`sha256:7d364db408a213e6ea0c8b1084549d8bb2cb90d470e2dc41da69cc0e0ce81d1e`