Skip to content

Commit f2a3d91

Browse files
committed
Update best of n selector, editor implementor to think before selecting/implementing
1 parent 93f6495 commit f2a3d91

File tree

4 files changed

+19
-27
lines changed

4 files changed

+19
-27
lines changed

.agents/base2/base2.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,7 @@ ${buildArray(
304304
useEditor &&
305305
'- IMPORTANT: You must spawn the editor agent to implement the changes after you have gathered all the context you need. This agent will do the best job of implementing the changes so you must spawn it for all non-trivial changes. Do not pass any prompt or params to the editor agent when spawning it. It will make its own best choices of what to do.',
306306
isMax &&
307-
`- IMPORTANT: You must spawn the editor-best-of-n-max agent to implement non-trivial code changes, since it will generate the best code changes from multiple implementation proposals. This is the best way to make high quality code changes -- strongly prefer using this agent over the str_replace or write_file tools, unless the change is very straightforward and obvious.`,
307+
`- IMPORTANT: You must spawn the editor-best-of-n-max agent to implement non-trivial code changes, since it will generate the best code changes from multiple implementation proposals. This is the best way to make high quality code changes -- strongly prefer using this agent over the str_replace or write_file tools, unless the change is very straightforward and obvious. Do not pass any prompt or params to the editor agent when spawning it. It will make its own best choices of what to do.`,
308308
(isDefault || isFast) &&
309309
'- Implement the changes using the str_replace or write_file tools.',
310310
isFast &&

.agents/editor/best-of-n/best-of-n-selector.ts

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,6 @@ export const createBestOfNSelector = (options: {
2626
effort: 'high',
2727
},
2828
}),
29-
...(isOpus && {
30-
reasoningOptions: {
31-
max_tokens: 4000,
32-
},
33-
}),
3429
displayName: isGpt5
3530
? 'Best-of-N GPT-5 Implementation Selector'
3631
: isGemini
@@ -114,12 +109,10 @@ Try to select an implementation that fulfills all the requirements in the user's
114109
## Response Format
115110
116111
${
117-
isSonnet
118-
? `Use <think> tags to briefly consider the implementations as needed to pick the best implementation.
119-
120-
If the best one is obvious or the implementations are very similar, you may not need to think very much (a few words suffice) or you may not need to use think tags at all, just pick the best one and output it. You have a dual goal of picking the best implementation and being fast (using as few words as possible).
112+
isSonnet || isOpus
113+
? `Use <think> tags to write out your thoughts about the implementations as needed to pick the best implementation. IMPORTANT: You should think really really hard to make sure you pick the absolute best implementation! As soon as you know for sure which implementation is the best, you should output your choice.
121114
122-
Then, do not write any other explanations AT ALL. You should directly output a single tool call to set_output with the selected implementationId.`
115+
Then, do not write any other explanations AT ALL. You should directly output a single tool call to set_output with the selected implementationId and short reason.`
123116
: `Output a single tool call to set_output with the selected implementationId. Do not write anything else.`
124117
}`,
125118
}

.agents/editor/best-of-n/editor-best-of-n.ts

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ export function createBestOfNEditor(
5252
properties: {
5353
n: {
5454
type: 'number',
55-
description: `Number of parallel implementor agents to spawn. Defaults to ${isDefault ? 4 : 5}. Use fewer for simple tasks and max of 10 for complex tasks.`,
55+
description: `Number of parallel implementor agents to spawn. Defaults to ${isMax ? 4 : 3}. Use fewer for simple tasks and max of 10 for complex tasks.`,
5656
},
5757
},
5858
},
@@ -73,7 +73,7 @@ function* handleStepsDefault({
7373
}: AgentStepContext): ReturnType<
7474
NonNullable<SecretAgentDefinition['handleSteps']>
7575
> {
76-
const DEFAULT_N = 4
76+
const DEFAULT_N = 3
7777
const selectorAgent = 'best-of-n-selector'
7878
const n = Math.min(
7979
10,
@@ -235,7 +235,7 @@ function* handleStepsMax({
235235
}: AgentStepContext): ReturnType<
236236
NonNullable<SecretAgentDefinition['handleSteps']>
237237
> {
238-
const MAX_N = 5
238+
const MAX_N = 4
239239
const selectorAgent = 'best-of-n-selector-opus'
240240
const n = Math.min(
241241
10,
@@ -296,6 +296,10 @@ function* handleStepsMax({
296296
implementorResults,
297297
) as any[]
298298

299+
logger.info(
300+
{ implementorResults, spawnedImplementations },
301+
'spawnedImplementations',
302+
)
299303
// Extract all the plans from the structured outputs
300304
const letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
301305
// Parse implementations from spawn results
@@ -304,14 +308,9 @@ function* handleStepsMax({
304308
content:
305309
'errorMessage' in result
306310
? `Error: ${result.errorMessage}`
307-
: extractLastMessageText(result),
311+
: extractLastMessageText(result) ?? '',
308312
}))
309313

310-
logger.info(
311-
{ spawnedImplementations, implementations },
312-
'spawnedImplementations',
313-
)
314-
315314
// Spawn selector with implementations as params
316315
const { toolResult: selectorResult, agentState: selectorAgentState } = yield {
317316
toolName: 'spawn_agents',
@@ -432,14 +431,14 @@ function* handleStepsOpus({
432431
}: AgentStepContext): ReturnType<
433432
NonNullable<SecretAgentDefinition['handleSteps']>
434433
> {
435-
const DEFAULT_N = 5
434+
const DEFAULT_N = 3
436435
const selectorAgent = 'best-of-n-selector-opus'
437436
const n = Math.min(
438437
10,
439438
Math.max(1, (params?.n as number | undefined) ?? DEFAULT_N),
440439
)
441440

442-
// Spawn implementor agents: 1 gemini + rest sonnet (if n >= 2)
441+
// Spawn implementor agents
443442
const implementorAgents = []
444443
for (let i = 0; i < n; i++) {
445444
implementorAgents.push({
@@ -459,8 +458,6 @@ function* handleStepsOpus({
459458
const spawnedImplementations =
460459
extractSpawnResults<{ text: string }[]>(implementorResults)
461460

462-
logger.info({ spawnedImplementations }, 'spawnedImplementations')
463-
464461
// Extract all the plans from the structured outputs
465462
const letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
466463
// Parse implementations from spawn results

.agents/editor/best-of-n/editor-implementor.ts

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -72,12 +72,14 @@ ${
7272
isGpt5 || isGemini
7373
? ``
7474
: `
75-
You can also use <think> tags interspersed between tool calls to think about the best way to implement the changes. Keep these thoughts very brief. You may not need to use think tags at all.
75+
IMPORTANT: Before you start writing your implementation, you should use <think> tags to think about the best way to implement the changes. You should think really really hard to make sure you implement the changes in the best way possible. Take as much time as you to think through all the cases to produce the best changes.
76+
77+
You can also use <think> tags interspersed between tool calls to think about the best way to implement the changes.
7678
7779
<example>
7880
7981
<think>
80-
[ Thoughts about the best way to implement the feature ]
82+
[ Long think about the best way to implement the changes ]
8183
</think>
8284
8385
<codebuff_tool_call>
@@ -99,7 +101,7 @@ You can also use <think> tags interspersed between tool calls to think about the
99101
</example>`
100102
}
101103
102-
After the edit tool calls, you can optionally mention any follow-up steps to take, like deleting a file, or a sepcific way to validate the changes. There's no need to use the set_output tool as your entire response will be included in the output.
104+
After the edit tool calls, you can optionally mention any follow-up steps to take, like deleting a file, or a specific way to validate the changes. There's no need to use the set_output tool as your entire response will be included in the output.
103105
104106
Your implementation should:
105107
- Be complete and comprehensive

0 commit comments

Comments
 (0)