Skip to content

Commit 852e3e3

Browse files
committed
refactor(.agents): extract CLI agent factory into modular files
- Create lib/cli-agent-types.ts with CliAgentConfig interface - Create lib/cli-agent-prompts.ts with prompt template functions and REVIEW_CRITERIA - Create lib/cli-agent-schemas.ts with shared outputSchema - Create lib/create-cli-agent.ts with factory function and validation - Add shortName and cliCommand config fields (replaces parsing from startCommand) - Add optional model field with DEFAULT_MODEL fallback - Add runtime validation for required config fields and shortName format - Make testResults optional in outputSchema (review mode uses reviewFindings) - Move CODEX_REVIEW_MODE_INSTRUCTIONS back to codex-cli.ts - Remove unnecessary re-exports for cleaner imports - Fix bug: use startCommand instead of cliCommand in Individual Scripts example - Fix inconsistent newline handling in getSystemPrompt - Add as const to inputSchema type fields for proper type inference - Refactor all 4 CLI agents to use the factory (claude-code-cli, codex-cli, gemini-cli, codebuff-local-cli)
1 parent b63ed44 commit 852e3e3

File tree

8 files changed

+493
-1716
lines changed

8 files changed

+493
-1716
lines changed

.agents/claude-code-cli.ts

Lines changed: 8 additions & 446 deletions
Large diffs are not rendered by default.

.agents/codebuff-local-cli.ts

Lines changed: 10 additions & 447 deletions
Large diffs are not rendered by default.

.agents/codex-cli.ts

Lines changed: 26 additions & 374 deletions
Large diffs are not rendered by default.

.agents/gemini-cli.ts

Lines changed: 10 additions & 449 deletions
Large diffs are not rendered by default.

.agents/lib/cli-agent-prompts.ts

Lines changed: 286 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
import type { CliAgentConfig } from './cli-agent-types'
2+
3+
const TMUX_SESSION_DOCS = `## Session Logs (Paper Trail)
4+
5+
All session data is stored in **YAML format** in \`debug/tmux-sessions/{session-name}/\`:
6+
7+
- \`session-info.yaml\` - Session metadata (start time, dimensions, status)
8+
- \`commands.yaml\` - YAML array of all commands sent with timestamps
9+
- \`capture-{sequence}-{label}.txt\` - Captures with YAML front-matter
10+
11+
\`\`\`bash
12+
# Capture with a descriptive label (recommended)
13+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "after-help-command" --wait 2
14+
15+
# Capture saved to: debug/tmux-sessions/{session}/capture-001-after-help-command.txt
16+
\`\`\`
17+
18+
Each capture file has YAML front-matter with metadata:
19+
\`\`\`yaml
20+
---
21+
sequence: 1
22+
label: after-help-command
23+
timestamp: 2025-01-01T12:00:30Z
24+
after_command: "/help"
25+
dimensions:
26+
width: 120
27+
height: 30
28+
---
29+
[terminal content]
30+
\`\`\`
31+
32+
The capture path is printed to stderr. Both you and the parent agent can read these files to see exactly what the CLI displayed.`
33+
34+
const TMUX_DEBUG_TIPS = `## Debugging Tips
35+
36+
- **Attach interactively**: \`tmux attach -t SESSION_NAME\`
37+
- **List sessions**: \`./scripts/tmux/tmux-cli.sh list\`
38+
- **View session logs**: \`ls debug/tmux-sessions/{session-name}/\`
39+
- **Get help**: \`./scripts/tmux/tmux-cli.sh help\` or \`./scripts/tmux/tmux-start.sh --help\``
40+
41+
const REVIEW_CRITERIA = `### What We're Looking For
42+
43+
The review should focus on these key areas:
44+
45+
1. **Code Organization Issues**
46+
- Poor file/module structure
47+
- Unclear separation of concerns
48+
- Functions/classes that do too many things
49+
- Missing or inconsistent abstractions
50+
51+
2. **Over-Engineering & Complexity**
52+
- Unnecessarily abstract or generic code
53+
- Premature optimization
54+
- Complex patterns where simple solutions would suffice
55+
- "Enterprise" patterns in small codebases
56+
57+
3. **AI-Generated Code Patterns ("AI Slop")**
58+
- Verbose, flowery language in comments ("It's important to note...", "Worth mentioning...")
59+
- Excessive disclaimers and hedging in documentation
60+
- Inconsistent coding style within the same file
61+
- Overly generic variable/function names
62+
- Redundant explanatory comments that just restate the code
63+
- Sudden shifts between formal and casual tone
64+
- Filler phrases that add no value
65+
66+
4. **Lack of Systems-Level Thinking**
67+
- Missing error handling strategy
68+
- No consideration for scaling or performance
69+
- Ignoring edge cases and failure modes
70+
- Lack of observability (logging, metrics, tracing)
71+
- Missing or incomplete type definitions`
72+
73+
export function getSpawnerPrompt(config: CliAgentConfig): string {
74+
const base = `Expert at testing ${config.cliName} CLI functionality using tmux, or performing code reviews via ${config.cliName}.
75+
76+
**Modes:**
77+
- \`test\` (default): Spawns tmux sessions, sends input to ${config.cliName} CLI, captures terminal output, and validates behavior.
78+
- \`review\`: Uses ${config.cliName} CLI to perform code reviews on specified files or directories.
79+
80+
**Paper trail:** Session logs are saved to \`debug/tmux-sessions/{session}/\`. Use \`read_files\` to view captures.
81+
82+
**Your responsibilities as the parent agent:**
83+
1. If \`scriptIssues\` is not empty, fix the scripts in \`scripts/tmux/\` based on the suggested fixes
84+
2. Use \`read_files\` on the capture paths to see what the CLI displayed
85+
3. Re-run the test after fixing any script issues`
86+
87+
return config.spawnerPromptExtras ? `${base}\n\n${config.spawnerPromptExtras}` : base
88+
}
89+
90+
export function getSystemPrompt(config: CliAgentConfig): string {
91+
const cliSpecificSection = config.cliSpecificDocs ? `\n${config.cliSpecificDocs}\n` : '\n'
92+
93+
return `You are an expert at testing ${config.cliName} CLI using tmux. You have access to helper scripts that handle the complexities of tmux communication with TUI apps.
94+
95+
## ${config.cliName} Startup
96+
97+
For testing ${config.cliName}, use the \`--command\` flag with permission bypass:
98+
99+
\`\`\`bash
100+
# Start ${config.cliName} CLI (with permission bypass for testing)
101+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "${config.startCommand}")
102+
103+
# Or with specific options
104+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "${config.startCommand} --help")
105+
\`\`\`
106+
107+
**Important:** ${config.permissionNote}
108+
${cliSpecificSection}
109+
## Helper Scripts
110+
111+
Use these scripts in \`scripts/tmux/\` for reliable CLI testing:
112+
113+
### Unified Script (Recommended)
114+
115+
\`\`\`bash
116+
# Start a ${config.cliName} test session (with permission bypass)
117+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "${config.startCommand}")
118+
119+
# Send input to the CLI
120+
./scripts/tmux/tmux-cli.sh send "$SESSION" "/help"
121+
122+
# Capture output (optionally wait first)
123+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --wait 3
124+
125+
# Stop the session when done
126+
./scripts/tmux/tmux-cli.sh stop "$SESSION"
127+
128+
# Stop all test sessions
129+
./scripts/tmux/tmux-cli.sh stop --all
130+
\`\`\`
131+
132+
### Individual Scripts (More Options)
133+
134+
\`\`\`bash
135+
# Start with custom settings
136+
./scripts/tmux/tmux-start.sh --command "${config.startCommand}" --name ${config.shortName}-test --width 160 --height 40
137+
138+
# Send text (auto-presses Enter)
139+
./scripts/tmux/tmux-send.sh ${config.shortName}-test "your prompt here"
140+
141+
# Send without pressing Enter
142+
./scripts/tmux/tmux-send.sh ${config.shortName}-test "partial" --no-enter
143+
144+
# Send special keys
145+
./scripts/tmux/tmux-send.sh ${config.shortName}-test --key Escape
146+
./scripts/tmux/tmux-send.sh ${config.shortName}-test --key C-c
147+
148+
# Capture with colors
149+
./scripts/tmux/tmux-capture.sh ${config.shortName}-test --colors
150+
151+
# Save capture to file
152+
./scripts/tmux/tmux-capture.sh ${config.shortName}-test -o output.txt
153+
\`\`\`
154+
155+
## Why These Scripts?
156+
157+
The scripts handle **bracketed paste mode** automatically. Standard \`tmux send-keys\` drops characters with TUI apps like ${config.cliName} due to how the CLI processes keyboard input. The helper scripts wrap input in escape sequences (\`\\e[200~...\\e[201~\`) so you don't have to.
158+
159+
${TMUX_SESSION_DOCS}
160+
161+
${TMUX_DEBUG_TIPS}`
162+
}
163+
164+
export function getDefaultReviewModeInstructions(config: CliAgentConfig): string {
165+
return `## Review Mode Instructions
166+
167+
In review mode, you send a detailed review prompt to ${config.cliName}. The prompt MUST start with the word "review" and include specific areas of concern.
168+
169+
${REVIEW_CRITERIA}
170+
171+
### Workflow
172+
173+
1. **Start ${config.cliName}** with permission bypass:
174+
\`\`\`bash
175+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "${config.startCommand}")
176+
\`\`\`
177+
178+
2. **Wait for CLI to initialize**, then capture:
179+
\`\`\`bash
180+
sleep 3
181+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "initial-state"
182+
\`\`\`
183+
184+
3. **Send a detailed review prompt** (MUST start with "review"):
185+
\`\`\`bash
186+
./scripts/tmux/tmux-cli.sh send "$SESSION" "Review [files/directories from prompt]. Look for:
187+
188+
1. CODE ORGANIZATION: Poor structure, unclear separation of concerns, functions doing too much
189+
2. OVER-ENGINEERING: Unnecessary abstractions, premature optimization, complex patterns where simple would work
190+
3. AI SLOP: Verbose comments ('it\\'s important to note'), excessive disclaimers, inconsistent style, generic names, redundant explanations
191+
4. SYSTEMS THINKING: Missing error handling strategy, no scaling consideration, ignored edge cases, lack of observability
192+
193+
For each issue found, specify the file, line number, what's wrong, and how to fix it. Be direct and specific."
194+
\`\`\`
195+
196+
4. **Wait for and capture the review output** (reviews take longer):
197+
\`\`\`bash
198+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "review-output" --wait 60
199+
\`\`\`
200+
201+
If the review is still in progress, wait and capture again:
202+
\`\`\`bash
203+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "review-output-continued" --wait 30
204+
\`\`\`
205+
206+
5. **Parse the review output** and populate \`reviewFindings\` with:
207+
- \`file\`: Path to the file with the issue
208+
- \`severity\`: "critical", "warning", "suggestion", or "info"
209+
- \`line\`: Line number if mentioned
210+
- \`finding\`: Description of the issue
211+
- \`suggestion\`: How to fix it
212+
213+
6. **Clean up**:
214+
\`\`\`bash
215+
./scripts/tmux/tmux-cli.sh stop "$SESSION"
216+
\`\`\``
217+
}
218+
219+
export function getInstructionsPrompt(config: CliAgentConfig): string {
220+
const reviewModeInstructions = config.reviewModeInstructions ?? getDefaultReviewModeInstructions(config)
221+
222+
return `Instructions:
223+
224+
Check the \`mode\` parameter to determine your operation:
225+
- If \`mode\` is "review": follow **Review Mode** instructions
226+
- Otherwise: follow **Test Mode** instructions (default)
227+
228+
---
229+
230+
## Test Mode Instructions
231+
232+
1. **Use the helper scripts** in \`scripts/tmux/\` - they handle bracketed paste mode automatically
233+
234+
2. **Start a ${config.cliName} test session** with permission bypass:
235+
\`\`\`bash
236+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "${config.startCommand}")
237+
\`\`\`
238+
239+
3. **Verify the CLI started** by capturing initial output:
240+
\`\`\`bash
241+
./scripts/tmux/tmux-cli.sh capture "$SESSION"
242+
\`\`\`
243+
244+
4. **Send commands** and capture responses:
245+
\`\`\`bash
246+
./scripts/tmux/tmux-cli.sh send "$SESSION" "your command here"
247+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --wait 3
248+
\`\`\`
249+
250+
5. **Always clean up** when done:
251+
\`\`\`bash
252+
./scripts/tmux/tmux-cli.sh stop "$SESSION"
253+
\`\`\`
254+
255+
6. **Use labels when capturing** to create a clear paper trail:
256+
\`\`\`bash
257+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "initial-state"
258+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "after-help-command" --wait 2
259+
\`\`\`
260+
261+
---
262+
263+
${reviewModeInstructions}
264+
265+
---
266+
267+
## Output (Both Modes)
268+
269+
**Report results using set_output** - You MUST call set_output with structured results:
270+
- \`overallStatus\`: "success", "failure", or "partial"
271+
- \`summary\`: Brief description of what was tested/reviewed
272+
- \`testResults\`: Array of test outcomes (for test mode)
273+
- \`scriptIssues\`: Array of any problems with the helper scripts
274+
- \`captures\`: Array of capture paths with labels
275+
- \`reviewFindings\`: Array of code review findings (for review mode)
276+
277+
**If a helper script doesn't work correctly**, report it in \`scriptIssues\` with:
278+
- \`script\`: Which script failed
279+
- \`issue\`: What went wrong
280+
- \`errorOutput\`: The actual error message
281+
- \`suggestedFix\`: How the parent agent should fix the script
282+
283+
**Always include captures** in your output so the parent agent can see what you saw.
284+
285+
For advanced options, run \`./scripts/tmux/tmux-cli.sh help\` or check individual scripts with \`--help\`.`
286+
}

.agents/lib/cli-agent-schemas.ts

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
// Shared output schema for CLI tester agents. testResults for test mode, reviewFindings for review mode.
2+
export const outputSchema = {
3+
type: 'object' as const,
4+
properties: {
5+
overallStatus: {
6+
type: 'string' as const,
7+
enum: ['success', 'failure', 'partial'],
8+
description: 'Overall test outcome',
9+
},
10+
summary: {
11+
type: 'string' as const,
12+
description: 'Brief summary of what was tested and the outcome',
13+
},
14+
testResults: {
15+
type: 'array' as const,
16+
items: {
17+
type: 'object' as const,
18+
properties: {
19+
testName: { type: 'string' as const, description: 'Name/description of the test' },
20+
passed: { type: 'boolean' as const, description: 'Whether the test passed' },
21+
details: { type: 'string' as const, description: 'Details about what happened' },
22+
capturedOutput: { type: 'string' as const, description: 'Relevant output captured from the CLI' },
23+
},
24+
required: ['testName', 'passed'],
25+
},
26+
description: 'Array of individual test results',
27+
},
28+
scriptIssues: {
29+
type: 'array' as const,
30+
items: {
31+
type: 'object' as const,
32+
properties: {
33+
script: { type: 'string' as const, description: 'Which script had the issue (e.g., "tmux-start.sh", "tmux-send.sh")' },
34+
issue: { type: 'string' as const, description: 'What went wrong when using the script' },
35+
errorOutput: { type: 'string' as const, description: 'The actual error message or unexpected output' },
36+
suggestedFix: { type: 'string' as const, description: 'Suggested fix or improvement for the parent agent to implement' },
37+
},
38+
required: ['script', 'issue', 'suggestedFix'],
39+
},
40+
description: 'Issues encountered with the helper scripts that the parent agent should fix',
41+
},
42+
captures: {
43+
type: 'array' as const,
44+
items: {
45+
type: 'object' as const,
46+
properties: {
47+
path: { type: 'string' as const, description: 'Path to the capture file (relative to project root)' },
48+
label: { type: 'string' as const, description: 'What this capture shows (e.g., "initial-cli-state", "after-help-command")' },
49+
timestamp: { type: 'string' as const, description: 'When the capture was taken' },
50+
},
51+
required: ['path', 'label'],
52+
},
53+
description: 'Paths to saved terminal captures for debugging - check debug/tmux-sessions/{session}/',
54+
},
55+
reviewFindings: {
56+
type: 'array' as const,
57+
items: {
58+
type: 'object' as const,
59+
properties: {
60+
file: { type: 'string' as const, description: 'File path where the issue was found' },
61+
severity: { type: 'string' as const, enum: ['critical', 'warning', 'suggestion', 'info'], description: 'Severity level of the finding' },
62+
line: { type: 'number' as const, description: 'Line number (if applicable)' },
63+
finding: { type: 'string' as const, description: 'Description of the issue or suggestion' },
64+
suggestion: { type: 'string' as const, description: 'Suggested fix or improvement' },
65+
},
66+
required: ['file', 'severity', 'finding'],
67+
},
68+
description: 'Code review findings (only populated in review mode)',
69+
},
70+
},
71+
required: ['overallStatus', 'summary', 'scriptIssues', 'captures'],
72+
}

.agents/lib/cli-agent-types.ts

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
export interface InputParamDefinition {
2+
type: 'string' | 'number' | 'boolean' | 'array' | 'object'
3+
description?: string
4+
enum?: string[]
5+
}
6+
7+
// Prevent extraInputParams from overriding 'mode' at compile time
8+
export type ExtraInputParams = Omit<Record<string, InputParamDefinition>, 'mode'>
9+
10+
export interface CliAgentConfig {
11+
id: string
12+
displayName: string
13+
cliName: string
14+
/** Used for session naming, e.g., 'claude-code' -> sessions named 'claude-code-test' */
15+
shortName: string
16+
startCommand: string
17+
permissionNote: string
18+
model: string
19+
spawnerPromptExtras?: string
20+
extraInputParams?: ExtraInputParams
21+
reviewModeInstructions?: string
22+
cliSpecificDocs?: string
23+
}

0 commit comments

Comments
 (0)