Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
ec744a3
half way thru tb set up
Jul 10, 2025
d2a605d
tb yaml
Jul 10, 2025
bea6d11
TB
Jul 10, 2025
1c03b32
tb bench scripts
Jul 10, 2025
f063ef4
trigger on push change
Jul 10, 2025
62ea191
typo
Jul 10, 2025
73873b1
trigger push
Jul 10, 2025
79651ff
python v
Jul 10, 2025
832ac1e
run local branch instead of prod
Jul 11, 2025
8a32d56
unnecessary env variables removed
Jul 11, 2025
e6c6a80
allow log inspection
Jul 11, 2025
e47a2bf
implement _env
Jul 11, 2025
f5ed0ca
npm, rust, cargo, clone github directly
Jul 11, 2025
3ec0dbe
Update setup_amazon_q.sh
arjun37602 Jul 11, 2025
a2e6ebe
clean up disk + check amt of free space
Jul 11, 2025
e5f51df
get gcc dependencies
Jul 14, 2025
f53926d
big timeout
Jul 14, 2025
803eda1
pipe config files from gh runner to docker
Jul 14, 2025
8db47f7
configure env + working with sso
Jul 14, 2025
c908960
changed default
Jul 14, 2025
fb9262d
default to latest
Jul 14, 2025
ebdd054
fixing qchat location + forcing correct auth
Jul 14, 2025
1d665f5
set env vars not just config file
Jul 14, 2025
12adf86
env vars all caps
Jul 15, 2025
6888d4b
confirm env vairables are visible
Jul 15, 2025
2c52fc1
roleName + code simplify
Jul 15, 2025
a7e3150
environment variable fix + local working
Jul 15, 2025
cefeaf3
use the correct git hash
Jul 15, 2025
7a1b06a
larger runner for storage
arjun37602 Jul 15, 2025
52ce30d
use full hash instead of short hash
arjun37602 Jul 15, 2025
407993c
fail if hash invalid
arjun37602 Jul 15, 2025
ea88fed
Force to run on manual trigger
arjun37602 Jul 15, 2025
5f1e553
responding to PR comments
Jul 16, 2025
80be062
take git hash as user input to avoid confusion
Jul 17, 2025
8e7b759
description
Jul 17, 2025
0584a66
Merge branch 'main' into terminal-bench-automation
arjun37602 Jul 17, 2025
fd9a09b
change n and allow clean up
arjun37602 Aug 5, 2025
93c312d
Update terminal-bench.yaml
arjun37602 Aug 6, 2025
4f4643b
Update setup_amazon_q.sh to use gnu instead of musl
arjun37602 Aug 6, 2025
b1bdfee
set n tasks
arjun37602 Aug 7, 2025
f0ca415
mimic official installation and use gnu
Aug 7, 2025
77eb58d
run chat_cli executable
Aug 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 5 additions & 17 deletions .github/workflows/terminal-bench.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,11 @@ name: Terminal-Bench
on:
workflow_dispatch:
inputs:
name:
description: 'Run terminal-bench workflow to test Q CLI in real terminal environments.'
default: 'all'
git_commit_hash:
description: 'Input git commit hash to run TB on (must exist on S3)'
required: true
default: 'latest'
type: string

jobs:
run-benchmark:
# avoids disk storage issues
Expand All @@ -21,6 +20,7 @@ jobs:
env:
CHAT_DOWNLOAD_ROLE_ARN: ${{ secrets.CHAT_DOWNLOAD_ROLE_ARN }}
CHAT_BUILD_BUCKET_NAME: ${{ secrets.CHAT_BUILD_BUCKET_NAME }}
GIT_HASH: ${{ github.event.inputs.git_commit_hash }}
permissions:
id-token: write
contents: read
Expand All @@ -41,18 +41,6 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v4

# Captures git hash of branch to query specific S3 bucket
- name: Set git hash
run: |
if [ -n "$GITHUB_SHA" ]; then
git_hash=$(git rev-parse "$GITHUB_SHA")
else
git_hash="latest"
fi
# appends to github_env file
echo "GIT_HASH=$git_hash" >> $GITHUB_ENV
echo "Git hash set to: $git_hash"

- name: Set up Python
uses: actions/setup-python@v4
with:
Expand All @@ -73,7 +61,7 @@ jobs:
- name: Run terminal benchmark
run: |
cd terminal-bench-test
tb run --agent-import-path main:AmazonQCLIAgent --dataset-name terminal-bench-core --dataset-version head
tb run --agent-import-path main:AmazonQCLIAgent --dataset-name terminal-bench-core --dataset-version head --cleanup --n-tasks=5

# uploads results if run fails as well to allow for easy log inspection
- name: Upload results
Expand Down
15 changes: 10 additions & 5 deletions terminal-bench-test/setup_amazon_q.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Q_SESSION_TOKEN=$(echo $TEMP_CREDENTIALS | jq -r '.Credentials.SessionToken')

# Download specific build from S3 based on commit hash
echo "Downloading Amazon Q CLI build from S3..."
S3_PREFIX="main/${GIT_HASH}/x86_64-unknown-linux-musl"
S3_PREFIX="main/${GIT_HASH}/x86_64-unknown-linux-gnu"
echo "Downloading qchat.zip from s3://.../${S3_PREFIX}/qchat.zip"

# Try download, if hash is invalid we fail.
Expand All @@ -45,15 +45,20 @@ AWS_ACCESS_KEY_ID="$QCHAT_ACCESSKEY" AWS_SECRET_ACCESS_KEY="$Q_SECRET_ACCESS_KEY
echo "Extracting qchat.zip..."
unzip -q qchat.zip

# move it to /usr/local/bin/qchat for path as qchat may not work otherwise
if cp qchat /usr/local/bin/ && chmod +x /usr/local/bin/qchat; then
# Extract and install - the executable is named chat_cli
# qchat → runs /usr/local/bin/qchat directly → which is the chat_cli binary

if [ -f "chat_cli" ]; then
cp chat_cli /usr/local/bin/qchat
ln -sf /usr/local/bin/qchat /usr/local/bin/q
chmod +x /usr/local/bin/qchat
echo "qchat installed successfully"
else
echo "ERROR: Failed to install qchat"
echo "ERROR: chat_cli executable not found"
ls -la
exit 1
fi

echo "Cleaning q zip"
rm -f qchat.zip
rm -rf qchat
rm -rf q qchat
Loading