Skip to content

Commit 1cce0bc

Browse files
committed
[buffbench] base-layer with iterative planner; spec all at once
1 parent 17e3f33 commit 1cce0bc

File tree

6 files changed

+182
-24
lines changed

6 files changed

+182
-24
lines changed

.agents/base2/base-layer.ts

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ const definition: SecretAgentDefinition = {
3737
'read-only-commander',
3838
'decomposing-thinker',
3939
'code-sketcher',
40+
'iterative-planner',
4041
'editor',
41-
'decomposing-reviewer',
4242
'reviewer',
4343
'context-pruner',
4444
],
@@ -83,25 +83,28 @@ The user asks you to implement a new feature. You respond in multiple steps:
8383
1a. Read all the relevant files using the read_files tool.
8484
2. Spawn one more file explorer and one more find-all-referencer with different prompts to find relevant files; spawn a decomposing thinker with questions on a key decision; spawn a decomposing thinker to plan out the feature part-by-part. Spawn a code sketcher to sketch out one key section of the code that is the most important or difficult.
8585
2a. Read all the relevant files using the read_files tool.
86-
3. Spawn a decomposing thinker to answer final design and implementation questions and critique the code sketch that was produced. Spawn one more code sketcher to sketch another key section.
86+
3. Spawn an iterative-planner with a step-by-step initial plan. Spawn one more code sketcher to sketch another key section.
8787
4. Spawn two editors to implement all the changes.
8888
5. Spawn a reviewer to review the changes made by the editors.
8989
9090
91-
## Guidelines
91+
## Spawning agents guidelines
9292
9393
- **Sequence agents properly:** Keep in mind dependencies when spawning different agents:
9494
- Spawn file explorers, find-all-referencer, and researchers before thinkers because then the thinkers can use the file/research results to come up with a better conclusions
9595
- Spawn thinkers before editors so editors can use the insights from the thinkers.
9696
- Reviewers should be spawned after editors.
97-
- **Use the decomposing thinker also to check what context you are missing:** Ask what context you don't have for specific subtasks that you should could still acquire (with file pickers or find-all-referencers or researchers or using the read_files tool). Getting more context is one of the most important things you should do before editing or coding anything.
98-
- **Spawn editors later** Only spawn editors after gathering all the context.
99-
- **Stop and ask for guidance:** You should feel free to stop and ask the user for guidance if you're stuck or don't know what to try next, or need a clarification.
97+
- **Use the decomposing thinker also to check what context you are missing:** Ask what context you don't have for specific subtasks that you should could still acquire (with file pickers or find-all-referencers or researchers or using the read_files tool). Getting more context is one of the most important things you should do before planning or editing or coding anything.
98+
- **Once you've gathered all the context you need, create a plan:** Spawn an iterative-planner with a step-by-step initial plan, or if it's not a complex task simply write out your plan as a bullet point list.
99+
- **Spawn editors later** Only spawn editors after gathering all the context and creating a plan.
100100
- **No need to include context:** When prompting an agent, realize that many agents can already see the entire conversation history, so you can be brief in prompting them without needing to include context.
101+
102+
## General guidelines
103+
- **Stop and ask for guidance:** You should feel free to stop and ask the user for guidance if you're stuck or don't know what to try next, or need a clarification.
101104
- **Be careful about terminal commands:** Be careful about instructing subagents to run terminal commands that could be destructive or have effects that are hard to undo (e.g. git push, running scripts that could alter production environments, installing packages globally, etc). Don't do any of these unless the user explicitly asks you to.
102105
`,
103106

104-
stepPrompt: `Don't forget to spawn agents that could help, especially: the file-explorer and find-all-referencer to get codebase context, the decomposing thinker to think about key decisions, the code sketcher to sketch out the key sections of code, and the reviewer/decomposing-reviewer to review code changes made by the editor(s).`,
107+
stepPrompt: `Don't forget to spawn agents that could help, especially: the file-explorer and find-all-referencer to get codebase context, the decomposing thinker to think about key decisions, the code sketcher to sketch out the key sections of code, the iterative-planner to create a plan, and the reviewer/decomposing-reviewer to review code changes made by the editor(s).`,
105108

106109
handleSteps: function* ({ prompt, params }) {
107110
let steps = 0
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
import { publisher } from '../constants'
2+
import type { SecretAgentDefinition } from '../types/secret-agent-definition'
3+
4+
const definition: SecretAgentDefinition = {
5+
id: 'iterative-planner',
6+
publisher,
7+
model: 'anthropic/claude-sonnet-4.5',
8+
displayName: 'Iterative Planner',
9+
spawnerPrompt:
10+
'Spawn this agent when you need to create a detailed implementation plan through iterative refinement with critique and validation steps. Spawn it with a rough step-by-step initial plan.',
11+
inputSchema: {
12+
prompt: {
13+
type: 'string',
14+
description: 'The initial step-by-step plan to refine and validate',
15+
},
16+
},
17+
includeMessageHistory: true,
18+
inheritParentSystemPrompt: true,
19+
outputMode: 'last_message',
20+
toolNames: ['spawn_agents'],
21+
spawnableAgents: ['plan-critiquer'],
22+
23+
instructionsPrompt: `You are an expert implementation planner. Your job is to:
24+
- Take an initial high-level plan and add key implementation details. Include important decisions and alternatives. Identify key interfaces and contracts between components and key pieces of code. Add validation steps to ensure correctness. Identify which steps can be done in parallel.
25+
- Spawn a plan-critiquer agent with the entire revised, fleshed out plan.
26+
- Incorporate feedback from the critiques to output a final plan.
27+
28+
Instructions:
29+
30+
1. Immediately spawn the iterative-planner agent with an updated plan:
31+
32+
Transform the initial plan into a detailed implementation guide that includes:
33+
34+
**All User Requirements:**
35+
- Make sure the plan addresses all the requirements in the user's request, and does not do other stuff that the user did not ask for.
36+
37+
**Key Decisions & Trade-offs:**
38+
- Architecture decisions and rationale
39+
- Cruxes of the plan
40+
- Alternatives considered
41+
42+
**Interfaces & Contracts:**
43+
- Clear API signatures between components
44+
- Key tricky bits of code (keep this short though)
45+
46+
**Validation Steps:**
47+
- How to verify each step works correctly
48+
- Include explicit verification steps when it makes sense in the plan.
49+
50+
**Dependencies & Parallelism:**
51+
- Identify which steps depend on each other and which can be done in parallel.
52+
53+
Feel free to completely change the initial plan if you think of something better.
54+
55+
2. After receiving the critique, revise the plan to address all concerns while maintaining simplicity and clarity. Output the final plan.
56+
57+
## Guidelines for the plan
58+
59+
- IMPORTANT: Don't overengineer the plan -- prefer minimalism and simplicity in almost every case. Streamline the final plan to be as minimal as possible.
60+
- IMPORTANT: You must pay attention to the user's request! Make sure to address all the requirements in the user's request, and nothing more.
61+
- Reuse existing code whenever possible -- you may need to seek out helpers from other parts of the codebase.
62+
- Use existing patterns and conventions from the codebase. Keep naming consistent. It's good to read other files that could have relevant patterns and examples to understand the conventions.
63+
- Try not to modify more files than necessary.`,
64+
}
65+
66+
export default definition

.agents/planners/plan-critiquer.ts

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
import { publisher } from '../constants'
2+
import type { SecretAgentDefinition } from '../types/secret-agent-definition'
3+
import type { ToolMessage } from '../types/util-types'
4+
5+
const definition: SecretAgentDefinition = {
6+
id: 'plan-critiquer',
7+
publisher,
8+
model: 'anthropic/claude-sonnet-4.5',
9+
displayName: 'Plan Critiquer',
10+
spawnerPrompt:
11+
'Analyzes implementation plans to identify areas of concern and proposes solutions through parallel thinking.',
12+
inputSchema: {
13+
prompt: {
14+
type: 'string',
15+
description:
16+
"The implementation plan to critique. Give a step-by-step breakdown of what you will do to fulfill the user's request.",
17+
},
18+
},
19+
includeMessageHistory: true,
20+
inheritParentSystemPrompt: true,
21+
outputMode: 'structured_output',
22+
outputSchema: {
23+
type: 'object',
24+
properties: {
25+
critique: {
26+
type: 'string',
27+
description: 'Analysis of the plan with identified areas of concern',
28+
},
29+
suggestions: {
30+
type: 'array',
31+
items: {
32+
type: 'object',
33+
},
34+
description: 'Suggestions for each area of concern',
35+
},
36+
},
37+
required: ['critique', 'suggestions'],
38+
},
39+
toolNames: ['spawn_agents', 'set_output'],
40+
spawnableAgents: ['decomposing-thinker'],
41+
42+
instructionsPrompt: `You are an expert plan reviewer. Your job is to:
43+
1. Analyze the implementation plan for potential issues and better alternatives.
44+
2. Identify 2-5 specific areas of concern that need deeper analysis
45+
3. Spawn a decomposing-thinker agent with the concerns as prompts. For each concern, formulate it as a specific question that can be answered by the thinker agent.
46+
47+
## Guidelines for the critique
48+
49+
IMPORTANT: You must pay attention to the user's request! Make sure to address all the requirements in the user's request, and nothing more.
50+
51+
For the plan:
52+
- Focus on implementing the simplest solution that will accomplish the task in a high quality manner.
53+
- Reuse existing code whenever possible -- you may need to seek out helpers from other parts of the codebase.
54+
- Use existing patterns and conventions from the codebase. Keep naming consistent. It's good to read other files that could have relevant patterns and examples to understand the conventions.
55+
- Try not to modify more files than necessary.
56+
`,
57+
58+
handleSteps: function* () {
59+
const { agentState } = yield 'STEP'
60+
61+
const lastAssistantMessage = agentState.messageHistory
62+
.filter((m) => m.role === 'assistant')
63+
.pop()
64+
65+
const critique =
66+
typeof lastAssistantMessage?.content === 'string'
67+
? lastAssistantMessage.content
68+
: ''
69+
const toolResult = agentState.messageHistory
70+
.filter((m) => m.role === 'tool' && m.content.toolName === 'spawn_agents')
71+
.pop() as ToolMessage
72+
73+
const suggestions = toolResult
74+
? toolResult.content.output.map((result) =>
75+
result.type === 'json' ? result.value : {},
76+
)[0]
77+
: []
78+
79+
yield {
80+
toolName: 'set_output',
81+
input: {
82+
critique,
83+
suggestions,
84+
},
85+
}
86+
},
87+
}
88+
89+
export default definition

evals/git-evals/run-eval-set.ts

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -134,21 +134,21 @@ async function runEvalSet(options: {
134134
evalDataPath: path.join(__dirname, 'eval-codebuff2.json'),
135135
outputDir,
136136
},
137-
{
138-
name: 'manifold',
139-
evalDataPath: path.join(__dirname, 'eval-manifold2.json'),
140-
outputDir,
141-
},
142-
{
143-
name: 'plane',
144-
evalDataPath: path.join(__dirname, 'eval-plane.json'),
145-
outputDir,
146-
},
147-
{
148-
name: 'saleor',
149-
evalDataPath: path.join(__dirname, 'eval-saleor.json'),
150-
outputDir,
151-
},
137+
// {
138+
// name: 'manifold',
139+
// evalDataPath: path.join(__dirname, 'eval-manifold2.json'),
140+
// outputDir,
141+
// },
142+
// {
143+
// name: 'plane',
144+
// evalDataPath: path.join(__dirname, 'eval-plane.json'),
145+
// outputDir,
146+
// },
147+
// {
148+
// name: 'saleor',
149+
// evalDataPath: path.join(__dirname, 'eval-saleor.json'),
150+
// outputDir,
151+
// },
152152
]
153153

154154
console.log(`Running ${evalConfigs.length} evaluations:`)

evals/git-evals/run-single-eval-process.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ async function main() {
7474
fingerprintId,
7575
codingAgent as any,
7676
agent,
77-
false,
77+
true,
7878
)
7979

8080
// Check again after long-running operation

evals/git-evals/run-single-eval.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ async function runSingleEvalTask(options: {
199199
fingerprintId,
200200
codingAgent,
201201
agentType,
202-
false,
202+
true,
203203
)
204204

205205
const duration = Date.now() - startTime

0 commit comments

Comments
 (0)