New iteration of task researcher with gpt-5 pro & gpt-5-worker

jahooma · jahooma · commit 4dea9f293e78 · 2025-10-24T17:26:23.000-07:00
diff --git a/.agents/base2/base2-gpt-5-worker.ts b/.agents/base2/base2-gpt-5-worker.ts
@@ -33,9 +33,9 @@ The user asks you to implement a new feature. You respond in multiple steps:
 2a. Read all the relevant files using the read_files tool.
 3. Use the str_replace or write_file tool to make the changes.
 4. Test your changes by running appropriate validation commands for the project (e.g. typechecks, tests, lints, etc.). You may have to explore the project to find the appropriate commands.
-5. Inform the parent agent you're done with your edits, but that it should double-check your work.`,
+5. End your turn.`,
 
-  stepPrompt: `Don't forget to spawn agents that could help, especially: the file-picker-max and code-searcher to get codebase context.`,
+  stepPrompt: undefined,
 }
 
 export default definition
diff --git a/.agents/base2/task-researcher/base2-with-task-researcher.ts b/.agents/base2/task-researcher/base2-with-task-researcher.ts
@@ -1,12 +1,10 @@
 import { buildArray } from '@codebuff/common/util/array'
 
 import { publisher } from '../../constants'
-import {
-  PLACEHOLDER,
-  type SecretAgentDefinition,
-} from '../../types/secret-agent-definition'
+import { type SecretAgentDefinition } from '../../types/secret-agent-definition'
 
 import type { ToolCall } from 'types/agent-definition'
+import type { UserMessage } from 'types/util-types'
 
 export const createBase2WithTaskResearcher: () => Omit<
   SecretAgentDefinition,
@@ -35,101 +33,43 @@ export const createBase2WithTaskResearcher: () => Omit<
     },
     outputMode: 'last_message',
     includeMessageHistory: false,
-    toolNames: ['spawn_agents', 'read_files', 'str_replace', 'write_file'],
+    toolNames: [
+      'spawn_agents',
+      'spawn_agent_inline',
+      'read_files',
+      'str_replace',
+      'write_file',
+    ],
     spawnableAgents: buildArray(
-      'task-researcher',
+      'task-researcher2',
       'file-picker-max',
       'code-searcher',
       'directory-lister',
       'glob-matcher',
       'researcher-web',
       'researcher-docs',
       'commander',
-      'code-reviewer',
-      'validator',
+      'planner-pro-with-files-input',
+      'base2-gpt-5-worker',
       'context-pruner',
     ),
 
-    systemPrompt: `You are Buffy, a strategic coding assistant that orchestrates complex coding tasks through specialized sub-agents.
-
-# Layers
-
-You spawn agents in "layers". Each layer is one spawn_agents tool call composed of multiple agents that answer your questions, do research, edit, and review.
-
-In between layers, you are encouraged to use the read_files tool to read files that you think are relevant to the user's request. It's good to read as many files as possible in between layers as this will give you more context on the user request.
-
-Continue to spawn layers of agents until have completed the user's request or require more information from the user.
-
-## Spawning agents guidelines
-
-
-- **Sequence agents properly:** Keep in mind dependencies when spawning different agents. Don't spawn agents in parallel that depend on each other. Be conservative sequencing agents so they can build on each other's insights:
-  - **Task researcher:** For medium to complex requests, you should first spawn a task-researcher agent by itself to gather context about the user's request. Spawn this before any other agents.
-  - Spawn file pickers, code-searcher, directory-lister, glob-matcher, commanders, and researchers before making edits.
-  - Spawn generate-plan agent after you have gathered all the context you need (and not before!).
-  - Only make edits after generating a plan.
-  - Code reviewers/validators should be spawned after you have made your edits.
-- **No need to include context:** When prompting an agent, realize that many agents can already see the entire conversation history, so you can be brief in prompting them without needing to include context.
-- **Don't spawn code reviewers/validators for trivial changes or quick follow-ups:** You should spawn the code reviewer/validator for most changes, but not for little changes or simple follow-ups.
-
-# Core Mandates
-
-- **Tone:** Adopt a professional, direct, and concise tone suitable for a CLI environment.
-- **Understand first, act second:** Always gather context and read relevant files BEFORE editing files.
-- **Quality over speed:** Prioritize correctness over appearing productive. Fewer, well-informed agents are better than many rushed ones.
-- **Validate assumptions:** Use researchers, file pickers, and the read_files tool to verify assumptions about libraries and APIs before implementing.
-- **Proactiveness:** Fulfill the user's request thoroughly, including reasonable, directly implied follow-up actions.
-- **Be careful about terminal commands:** Be careful about instructing subagents to run terminal commands that could be destructive or have effects that are hard to undo (e.g. git push, running scripts that could alter production environments, installing packages globally, etc). Don't do any of these unless the user explicitly asks you to.
-- **Do what the user asks:** If the user asks you to do something, even running a risky terminal command, do it.
-- **Make at least one tool call in every step:** You *must* make at least one tool call (with "<codebuff_tool_call>" tags) in every step unless you are done with the task. If you don't, you will be cut off by the system and the task will be incomplete.
-
-# Code Editing Mandates
-
-- **Conventions:** Rigorously adhere to existing project conventions when reading or modifying code. Analyze surrounding code, tests, and configuration first.
-- **Libraries/Frameworks:** NEVER assume a library/framework is available or appropriate. Verify its established usage within the project (check imports, configuration files like 'package.json', 'Cargo.toml', 'requirements.txt', 'build.gradle', etc., or observe neighboring files) before employing it.
-- **Style & Structure:** Mimic the style (formatting, naming), structure, framework choices, typing, and architectural patterns of existing code in the project.
-- **Idiomatic Changes:** When editing, understand the local context (imports, functions/classes) to ensure your changes integrate naturally and idiomatically.
-- **No new code comments:** Do not add any new comments while writing code, unless they were preexisting comments (keep those!) or unless the user asks you to add comments!
-- **Minimal Changes:** Make as few changes as possible to satisfy the user request! Don't go beyond what the user has asked for.
-- **Code Reuse:** Always reuse helper functions, components, classes, etc., whenever possible! Don't reimplement what already exists elsewhere in the codebase.
-- **Front end development** We want to make the UI look as good as possible. Don't hold back. Give it your all.
-    - Include as many relevant features and interactions as possible
-    - Add thoughtful details like hover states, transitions, and micro-interactions
-    - Apply design principles: hierarchy, contrast, balance, and movement
-    - Create an impressive demonstration showcasing web development capabilities
--  **Refactoring Awareness:** Whenever you modify an exported symbol like a function or class or variable, you should find and update all the references to it appropriately.
--  **Package Management:** When adding new packages, use the run_terminal_command tool to install the package rather than editing the package.json file with a guess at the version number to use (or similar for other languages). This way, you will be sure to have the latest version of the package. Do not install packages globally unless asked by the user (e.g. Don't run \`npm install -g <package-name>\`). Always try to use the package manager associated with the project (e.g. it might be \`pnpm\` or \`bun\` or \`yarn\` instead of \`npm\`, or similar for other languages).
--  **Code Hygiene:** Make sure to leave things in a good state:
-    - Don't forget to add any imports that might be needed
-    - Remove unused variables, functions, and files as a result of your changes.
-    - If you added files or functions meant to replace existing code, then you should also remove the previous code.
-- **Edit multiple files at once:** When you edit files, you must make as many tool calls as possible in a single message. This is faster and much more efficient than making all the tool calls in separate messages. It saves users thousands of dollars in credits if you do this!
-
-${PLACEHOLDER.FILE_TREE_PROMPT_SMALL}
-${PLACEHOLDER.KNOWLEDGE_FILES_CONTENTS}
-
-# Initial Git Changes
-
-The following is the state of the git repository at the start of the conversation. Note that it is not updated to reflect any subsequent changes made by the user or the agents.
-
-${PLACEHOLDER.GIT_CHANGES_PROMPT}
-`,
-
     instructionsPrompt: `Orchestrate the completion of the user's request using your specialized sub-agents. Take your time and be comprehensive.
     
 ## Example workflow
 
 The user asks you to implement a new feature. You respond in multiple steps:
 
-1. Spawn a task-researcher agent to research the task and get key facts and insights.
-2. Use the str_replace or write_file tool to make the changes.
-3. Spawn a code-reviewer to review the changes. Consider making changes suggested by the code-reviewer.
-4. Spawn a validator to run validation checks (tests, typechecks, etc.) to ensure the changes are correct.
+1. Spawn a task-researcher2 agent by itself to research the task and get key facts and insights.
+2. Spawn a planner-pro-with-files-input agent to generate a plan for the changes.
+3. Spawn a base2-gpt-5-worker agent to do the editing.
+4. Test your changes by running appropriate validation commands for the project (e.g. typechecks, tests, lints, etc.). You may have to explore the project to find the appropriate commands.
+5. Inform the user that you have completed the task in one sentence or a few short bullet points without a final summary. Don't create any summary markdown files, unless asked by the user.
 
-You may not need to spawn the task-researcher if the user's request is trivial or if you have already gathered all the information you need from the conversation history.
+You may not need to spawn the task-researcher2 if the user's request is trivial or if you have already gathered all the information you need from the conversation history.
 `,
 
-    stepPrompt: `Don't forget to spawn agents that could help, especially: the task-researcher to research the task, code-reviewer to review changes, and the validator to run validation commands.`,
+    stepPrompt: `Don't forget to spawn agents that could help, especially: the task-researcher2 to research the task, and base2-gpt-5-worker to do the editing. After completing the user request, summarize your changes in a sentence or a few short bullet points. Do not create any summary markdown files, unless asked by the user. Then, end your turn.`,
 
     handleSteps: function* ({ params, logger }) {
       let steps = 0
@@ -148,10 +88,10 @@ You may not need to spawn the task-researcher if the user's request is trivial o
         const { stepsComplete, agentState } = yield 'STEP'
         if (stepsComplete) break
 
-        // Check tool results for spawning of a task researcher...
-        // If found, reset messages to only include the task researcher's result and read the relevant files!
+        // Check last tool result for spawning of a task researcher...
         const spawnAgentsToolResults = agentState.messageHistory
           .filter((message) => message.role === 'tool')
+          .slice(-1)
           .filter((message) => message.content.toolName === 'spawn_agents')
           .map((message) => message.content.output)
           .flat()
@@ -162,36 +102,84 @@ You may not need to spawn the task-researcher if the user's request is trivial o
         }[]
 
         const taskResearcherResult = spawnAgentsToolResults?.find(
-          (result) => result.agentType === 'task-researcher',
+          (result) => result.agentType === 'task-researcher2',
         )
         if (taskResearcherResult) {
+          // If task researcher was spawned:
+          // 1. Reset context from the last user message.
+          // 2. Read all the relevant files using the read_files tool.
+          // 3. Spawn a planner-pro agent with the appropriate context (prompt, research report, and relevantFiles).
+          // 4. Spawn a base2-gpt-5-worker agent to implement the plan.
+          // 4. Step all
+
           const taskResearcherOutput = taskResearcherResult.value.value as {
-            analysis: string
-            keyFacts: string[]
+            report: string
             relevantFiles: string[]
-            userPrompt: string
-          }
-          const initialMessage = `<research>${taskResearcherOutput.keyFacts.join('\n')}</research>${taskResearcherOutput.userPrompt}`
-          const message = {
-            role: 'user',
-            content: initialMessage,
           }
-          const instructionsMessage = agentState.messageHistory.findLast(
+
+          const lastUserMessageIndex = agentState.messageHistory.findLastIndex(
             (message) =>
               message.role === 'user' &&
-              message.keepLastTags?.[0] === 'INSTRUCTIONS_PROMPT',
+              (typeof message.content === 'string'
+                ? message.content
+                : message.content[0].type === 'text'
+                  ? message.content[0].text
+                  : ''
+              ).includes('<user_message>'),
           )
+          const lastUserMessage = agentState.messageHistory[
+            lastUserMessageIndex
+          ] as UserMessage
+          const userPrompt = !lastUserMessage
+            ? ''
+            : typeof lastUserMessage.content === 'string'
+              ? lastUserMessage.content
+              : lastUserMessage.content
+                  .filter((content) => content.type === 'text')
+                  .map((content) => content.text)
+                  .join()
+          const userPromptText = userPrompt
+            .split('<user_message>')[1]
+            .split('</user_message>')[0]
+            .trim()
+
+          const newMessages =
+            agentState.messageHistory.slice(lastUserMessageIndex)
           yield {
             toolName: 'set_messages',
             input: {
-              messages: [message, instructionsMessage],
+              messages: newMessages,
             },
             includeToolCall: false,
           } satisfies ToolCall<'set_messages'>
+
           yield {
             toolName: 'read_files',
             input: { paths: taskResearcherOutput.relevantFiles },
           } satisfies ToolCall<'read_files'>
+
+          yield {
+            toolName: 'spawn_agents',
+            input: {
+              agents: [
+                {
+                  agent_type: 'planner-pro-with-files-input',
+                  prompt: userPromptText,
+                  params: {
+                    researchReport: taskResearcherOutput.report,
+                    relevantFiles: taskResearcherOutput.relevantFiles,
+                  },
+                },
+              ],
+            },
+          } satisfies ToolCall<'spawn_agents'>
+
+          yield {
+            toolName: 'spawn_agent_inline',
+            input: {
+              agent_type: 'base2-gpt-5-worker',
+            },
+          }
         }
         // Continue loop!
       }
diff --git a/.agents/planners/planner-pro-with-files-input.ts b/.agents/planners/planner-pro-with-files-input.ts
@@ -0,0 +1,84 @@
+import {
+  PLACEHOLDER,
+  type SecretAgentDefinition,
+} from '../types/secret-agent-definition'
+import { publisher } from '../constants'
+
+const definition: SecretAgentDefinition = {
+  id: 'planner-pro-with-files-input',
+  model: 'openai/gpt-5-pro',
+  publisher,
+  displayName: 'Planner Pro',
+  spawnerPrompt:
+    'Uses deep thinking to generate an implementation plan for a user request.',
+  inputSchema: {
+    prompt: {
+      type: 'string',
+      description: 'A coding task to complete',
+    },
+    params: {
+      type: 'object',
+      properties: {
+        researchReport: {
+          type: 'string',
+          description: 'A research report on the user request',
+        },
+        relevantFiles: {
+          type: 'array',
+          items: {
+            type: 'string',
+            description:
+              'The path to a file that is relevant to the user request',
+          },
+          description:
+            'The paths to files that are relevant to the user request',
+        },
+      },
+      required: ['relevantFiles'],
+    },
+  },
+  outputMode: 'last_message',
+  spawnableAgents: [],
+
+  systemPrompt: `You are the planner-pro agent. You are an expert software engineer which is good at formulating surprisingly simple and clear plans.
+
+IMPORTANT: You do not have access to any tools. You can only analyze and write out plans. Do not attempt to use any tools! Your goal is to generate the best plan for the user's request.
+
+${PLACEHOLDER.FILE_TREE_PROMPT_SMALL}`,
+
+  instructionsPrompt: `Your task is to output the best plan to accomplish the user's request in a single message. Do not call any tools.
+
+The plan should be an implementation plan for the coding agent to act on to satisfy the user's request. So you can give instructions to the coding agent that include at a high level what files to change and what commands/tools to run.
+
+No need to write out all the code that should be changed. Just focus on the trickiest parts, the key decisions, and sketch the rest so that a smart coding agent can fill in the details.
+
+You can excerpt key sections of the code using markdown code blocks, e.g.
+
+path/to/file.ts
+\`\`\`
+// ... existing code ...
+[this is is the key section of code]
+// ... existing code ...
+\`\`\`
+
+Here is a priority-ordered list of key principles for the plan. You must:
+- Satisfy all the original user requirements to the greatest extent possible.
+- Create the simplest and most straightforward plan to implement.
+- Make the plan maintainable, clear, and easy to understand.
+- Include the fewest dependencies and moving parts.
+- Reuse existing helper functions and other code whenever possible.
+- Modify the fewest files.
+
+Please output the plan text itself, without labels or meta-commentary.`,
+
+  handleSteps: function* ({ params }) {
+    yield {
+      toolName: 'read_files',
+      input: { paths: params?.relevantFiles || [] },
+    }
+
+    yield 'STEP_ALL'
+  },
+}
+
+export default definition
diff --git a/.agents/researcher/task-researcher2.ts b/.agents/researcher/task-researcher2.ts