Skip to content

Commit e3a6450

Browse files
committed
evals: switch to grok 4 for prompter
1 parent e3db58f commit e3a6450

File tree

2 files changed

+5
-8
lines changed

2 files changed

+5
-8
lines changed

evals/git-evals/run-git-evals.ts

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ import path from 'path'
44

55
import { disableLiveUserInputCheck } from '@codebuff/backend/live-user-inputs'
66
import { promptAiSdkStructured } from '@codebuff/backend/llm-apis/vercel-ai-sdk/ai-sdk'
7-
import { models } from '@codebuff/common/old-constants'
87
import { withTimeout } from '@codebuff/common/util/promise'
98
import { generateCompactId } from '@codebuff/common/util/string'
109
import { cloneDeep } from 'lodash'
@@ -136,14 +135,16 @@ Note that files can only be changed with tools. If no tools are called, no files
136135
You must decide whether to:
137136
1. 'continue' - Generate a follow-up prompt for Codebuff
138137
2. 'complete' - The implementation is done and fully satisfies the spec, including tests, documentation, and any other relevant artifacts
138+
- In this case, just put an empty string for next_prompt
139139
3. 'halt' - The implementation is off track and unlikely to be completed within ${MAX_ATTEMPTS - attempts} more attempts
140+
- In this case, just put an empty string for next_prompt
140141
141142
If deciding to continue, include a clear, focused prompt for Codebuff in next_prompt. Note that Codebuff does not have access to the spec, so you must describe the changes you want Codebuff to make in a way that is clear and concise.
142143
Explain your reasoning in detail.`,
143144
},
144145
],
145146
schema: AgentDecisionSchema,
146-
model: models.openrouter_gemini2_5_flash,
147+
model: 'x-ai/grok-4-fast:free',
147148
clientSessionId,
148149
fingerprintId,
149150
userInputId: generateCompactId(),
@@ -160,13 +161,9 @@ Explain your reasoning in detail.`,
160161
console.log('Agent reasoning:', agentResponse.reasoning)
161162
console.log('Agent prompt:', agentResponse.next_prompt)
162163

163-
if (agentResponse.decision === 'continue' && !agentResponse.next_prompt) {
164-
agentResponse.next_prompt = 'continue'
165-
}
166-
167164
// If continuing, run CodeBuff with the agent's prompt
168165
if (agentResponse.decision === 'continue') {
169-
const prompt = agentResponse.next_prompt!
166+
const prompt = agentResponse.next_prompt || 'continue'
170167

171168
// Use loopMainPrompt with timeout wrapper
172169
const codebuffResult = await withTimeout(

evals/git-evals/types.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ export interface FullEvalLog {
7676
export const AgentDecisionSchema = z.object({
7777
decision: z.enum(['continue', 'complete', 'halt']),
7878
reasoning: z.string(),
79-
next_prompt: z.string().optional(),
79+
next_prompt: z.string(),
8080
})
8181

8282
export const CommitSelectionSchema = z.object({

0 commit comments

Comments
 (0)