Nightly Terminal-Bench #50
nightly-terminal-bench.yml
on: schedule
Determine models to test
4s
Matrix: benchmark
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
terminal-bench-results-anthropic-claude-sonnet-4-5-20199776463
|
6.75 MB |
sha256:4313766ddf43c43ccbf8c7c58f4112b6905e5f0861357f0ae969216de75b1002
|
|
|
terminal-bench-results-openai-gpt-5.1-codex-20199776463
|
5.6 MB |
sha256:1424c0ace0092da1464d6a55cf8494bea4d01ebedf4c647709378f89c365fbfe
|
|