Skip to content

Commit 1202238

Browse files
authored
🤖 fix: improve status_set tool description and display logic (#466)
## Overview Fixes two critical bugs with the `status_set` tool and refines completion status examples. ## Problems Fixed ### 1. Validation failures showed as 'completed' ✓ instead of 'failed' ✗ When `status_set` validation failed (e.g., invalid emoji), the tool displayed 'completed' status even though it failed. This made users think validation was silently failing, especially when the status didn't appear in the sidebar. **Root cause:** Tool status determination only checked `part.state === 'output-available'` (meaning the tool returned a result), but didn't check whether that result indicated success or failure via `result.success`. ### 2. Status didn't persist across page reloads or workspace switches Workspaces with successful `status_set` calls would lose their status after reload because `loadHistoricalMessages()` didn't reconstruct derived state from tool results. **Root cause:** Derived state (agentStatus, currentTodos) was only updated during live streaming events, not when loading historical messages. ## Solutions ### 1. Show 'failed' status for validation errors - Enhanced status determination to check `result.success` for tools returning `{ success: boolean }` - Extracted `hasSuccessResult()` and `hasFailureResult()` helpers to eliminate duplication - Display error message in StatusSetToolCall UI when validation fails ### 2. Shared tool result processing Extracted `processToolResult()` as single source of truth for updating derived state. Called from: 1. **`handleToolCallEnd()`** - live streaming events 2. **`loadHistoricalMessages()`** - historical message loading This ensures agentStatus and currentTodos are reconstructed uniformly whether processing live events or historical snapshots. ### 3. Refined completion status examples Replaced generic examples with outcome-specific ones that show variety: - ✅ Success: 'PR checks pass and ready to merge' - ❌ Failure: 'CreateWorkspace Tests failed' - ⚠️ Warning: 'Encountered serious issue with design' Encourages agents to communicate actual outcomes, not just completion. ## Implementation Details **Status determination (StreamingMessageAggregator.ts:772-789):** ```typescript if (part.state === "output-available") { status = hasFailureResult(part.output) ? "failed" : "completed"; } ``` **Shared result processing (StreamingMessageAggregator.ts:530-552):** ```typescript private processToolResult(toolName: string, input: unknown, output: unknown): void { if (toolName === "todo_write" && hasSuccessResult(output)) { // Update todos... } if (toolName === "status_set" && hasSuccessResult(output)) { // Update agentStatus... } } ``` **Historical message processing (StreamingMessageAggregator.ts:199-213):** ```typescript loadHistoricalMessages(messages: CmuxMessage[]): void { for (const message of messages) { this.messages.set(message.id, message); if (message.role === "assistant") { for (const part of message.parts) { if (isDynamicToolPart(part) && part.state === "output-available") { this.processToolResult(part.toolName, part.input, part.output); } } } } } ``` ## Testing - Added 5 new unit tests for status persistence and validation display - All 842 tests pass - Typecheck and lint pass **New tests:** 1. Show 'failed' status when validation fails 2. Show 'completed' status when validation succeeds 3. Reconstruct agentStatus from historical messages 4. Use most recent status when loading multiple messages 5. Don't reconstruct from failed status_set ## Before/After ### Validation Failure Display **Before:** ``` status_set({ emoji: "not-emoji", message: "test" }) → Shows: completed ✓ → Sidebar: No status appears → User: "Is validation failing silently?" 🤔 ``` **After:** ``` status_set({ emoji: "not-emoji", message: "test" }) → Shows: failed ✗ (emoji must be a single emoji character) → Sidebar: No status appears (correct - validation failed) → User: Clear feedback! ✨ ``` ### Status Persistence **Before:** ``` 1. Agent sets status: "🔍 Analyzing code" 2. Reload page or switch workspace 3. Status disappears (even though it was successful) ``` **After:** ``` 1. Agent sets status: "🔍 Analyzing code" 2. Reload page or switch workspace 3. Status persists ✅ ``` ## Architecture Benefits 1. **Single source of truth**: One method processes tool results, regardless of source 2. **Leverages existing data**: Tool parts already contain everything we need 3. **No special reconstruction logic**: Just process parts uniformly 4. **Easier to extend**: Adding new derived state only requires updating `processToolResult()` 5. **Matches architecture**: Historical messages are complete snapshots _Generated with `cmux`_
1 parent e7e0f37 commit 1202238

File tree

4 files changed

+336
-49
lines changed

4 files changed

+336
-49
lines changed

src/components/tools/StatusSetToolCall.tsx

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,17 @@ interface StatusSetToolCallProps {
1212

1313
export const StatusSetToolCall: React.FC<StatusSetToolCallProps> = ({
1414
args,
15-
result: _result,
15+
result,
1616
status = "pending",
1717
}) => {
1818
const statusDisplay = getStatusDisplay(status);
1919

20+
// Show error message if validation failed
21+
const errorMessage =
22+
status === "failed" && result && typeof result === "object" && "error" in result
23+
? String(result.error)
24+
: undefined;
25+
2026
return (
2127
<ToolContainer expanded={false}>
2228
<ToolHeader>
@@ -25,6 +31,7 @@ export const StatusSetToolCall: React.FC<StatusSetToolCallProps> = ({
2531
<Tooltip>status_set</Tooltip>
2632
</TooltipWrapper>
2733
<span className="text-muted-foreground italic">{args.message}</span>
34+
{errorMessage && <span className="text-error-foreground text-sm">({errorMessage})</span>}
2835
<StatusIndicator status={status}>{statusDisplay}</StatusIndicator>
2936
</ToolHeader>
3037
</ToolContainer>

src/utils/messages/StreamingMessageAggregator.status.test.ts

Lines changed: 242 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ describe("StreamingMessageAggregator - Agent Status", () => {
199199
expect(aggregator.getAgentStatus()).toBeUndefined();
200200
});
201201

202-
it("should clear agent status on stream-start (different from TODO behavior)", () => {
202+
it("should clear agent status when new user message arrives", () => {
203203
const aggregator = new StreamingMessageAggregator("2024-01-01T00:00:00.000Z");
204204

205205
// Start first stream and set status
@@ -245,16 +245,253 @@ describe("StreamingMessageAggregator - Agent Status", () => {
245245
// Status persists after stream ends
246246
expect(aggregator.getAgentStatus()?.message).toBe("First task");
247247

248-
// Start a NEW stream - status should be cleared
248+
// User sends a NEW message - status should be cleared
249+
const newUserMessage = {
250+
id: "msg2",
251+
role: "user" as const,
252+
parts: [{ type: "text" as const, text: "What's next?" }],
253+
metadata: { timestamp: Date.now(), historySequence: 2 },
254+
};
255+
aggregator.handleMessage(newUserMessage);
256+
257+
// Status should be cleared on new user message
258+
expect(aggregator.getAgentStatus()).toBeUndefined();
259+
});
260+
261+
it("should show 'failed' status in UI when status_set validation fails", () => {
262+
const aggregator = new StreamingMessageAggregator("2024-01-01T00:00:00.000Z");
263+
const messageId = "msg1";
264+
265+
// Start a stream
266+
aggregator.handleStreamStart({
267+
type: "stream-start",
268+
workspaceId: "workspace1",
269+
messageId,
270+
model: "test-model",
271+
historySequence: 1,
272+
});
273+
274+
// Add a status_set tool call with invalid emoji
275+
aggregator.handleToolCallStart({
276+
type: "tool-call-start",
277+
workspaceId: "workspace1",
278+
messageId,
279+
toolCallId: "tool1",
280+
toolName: "status_set",
281+
args: { emoji: "not-an-emoji", message: "test" },
282+
tokens: 10,
283+
timestamp: Date.now(),
284+
});
285+
286+
// Complete with validation failure
287+
aggregator.handleToolCallEnd({
288+
type: "tool-call-end",
289+
workspaceId: "workspace1",
290+
messageId,
291+
toolCallId: "tool1",
292+
toolName: "status_set",
293+
result: { success: false, error: "emoji must be a single emoji character" },
294+
});
295+
296+
// End the stream to finalize message
297+
aggregator.handleStreamEnd({
298+
type: "stream-end",
299+
workspaceId: "workspace1",
300+
messageId,
301+
metadata: { model: "test-model" },
302+
parts: [],
303+
});
304+
305+
// Check that the tool message shows 'failed' status in the UI
306+
const displayedMessages = aggregator.getDisplayedMessages();
307+
const toolMessage = displayedMessages.find((m) => m.type === "tool");
308+
expect(toolMessage).toBeDefined();
309+
expect(toolMessage?.type).toBe("tool");
310+
if (toolMessage?.type === "tool") {
311+
expect(toolMessage.status).toBe("failed");
312+
expect(toolMessage.toolName).toBe("status_set");
313+
}
314+
315+
// And status should NOT be updated in aggregator
316+
expect(aggregator.getAgentStatus()).toBeUndefined();
317+
});
318+
319+
it("should show 'completed' status in UI when status_set validation succeeds", () => {
320+
const aggregator = new StreamingMessageAggregator("2024-01-01T00:00:00.000Z");
321+
const messageId = "msg1";
322+
323+
// Start a stream
249324
aggregator.handleStreamStart({
250325
type: "stream-start",
251326
workspaceId: "workspace1",
252-
messageId: "msg2",
327+
messageId,
253328
model: "test-model",
254-
historySequence: 2,
329+
historySequence: 1,
330+
});
331+
332+
// Add a successful status_set tool call
333+
aggregator.handleToolCallStart({
334+
type: "tool-call-start",
335+
workspaceId: "workspace1",
336+
messageId,
337+
toolCallId: "tool1",
338+
toolName: "status_set",
339+
args: { emoji: "🔍", message: "Analyzing code" },
340+
tokens: 10,
341+
timestamp: Date.now(),
342+
});
343+
344+
// Complete successfully
345+
aggregator.handleToolCallEnd({
346+
type: "tool-call-end",
347+
workspaceId: "workspace1",
348+
messageId,
349+
toolCallId: "tool1",
350+
toolName: "status_set",
351+
result: { success: true, emoji: "🔍", message: "Analyzing code" },
352+
});
353+
354+
// End the stream to finalize message
355+
aggregator.handleStreamEnd({
356+
type: "stream-end",
357+
workspaceId: "workspace1",
358+
messageId,
359+
metadata: { model: "test-model" },
360+
parts: [],
255361
});
256362

257-
// Status should be cleared on new stream start
363+
// Check that the tool message shows 'completed' status in the UI
364+
const displayedMessages = aggregator.getDisplayedMessages();
365+
const toolMessage = displayedMessages.find((m) => m.type === "tool");
366+
expect(toolMessage).toBeDefined();
367+
expect(toolMessage?.type).toBe("tool");
368+
if (toolMessage?.type === "tool") {
369+
expect(toolMessage.status).toBe("completed");
370+
expect(toolMessage.toolName).toBe("status_set");
371+
}
372+
373+
// And status SHOULD be updated in aggregator
374+
const status = aggregator.getAgentStatus();
375+
expect(status).toBeDefined();
376+
expect(status?.emoji).toBe("🔍");
377+
expect(status?.message).toBe("Analyzing code");
378+
});
379+
380+
it("should reconstruct agentStatus when loading historical messages", () => {
381+
const aggregator = new StreamingMessageAggregator("2024-01-01T00:00:00.000Z");
382+
383+
// Create historical messages with a completed status_set tool call
384+
const historicalMessages = [
385+
{
386+
id: "msg1",
387+
role: "user" as const,
388+
parts: [{ type: "text" as const, text: "Hello" }],
389+
metadata: { timestamp: Date.now(), historySequence: 1 },
390+
},
391+
{
392+
id: "msg2",
393+
role: "assistant" as const,
394+
parts: [
395+
{ type: "text" as const, text: "Working on it..." },
396+
{
397+
type: "dynamic-tool" as const,
398+
toolCallId: "tool1",
399+
toolName: "status_set",
400+
state: "output-available" as const,
401+
input: { emoji: "🔍", message: "Analyzing code" },
402+
output: { success: true, emoji: "🔍", message: "Analyzing code" },
403+
timestamp: Date.now(),
404+
},
405+
],
406+
metadata: { timestamp: Date.now(), historySequence: 2 },
407+
},
408+
];
409+
410+
// Load historical messages
411+
aggregator.loadHistoricalMessages(historicalMessages);
412+
413+
// Status should be reconstructed from the historical tool call
414+
const status = aggregator.getAgentStatus();
415+
expect(status).toBeDefined();
416+
expect(status?.emoji).toBe("🔍");
417+
expect(status?.message).toBe("Analyzing code");
418+
});
419+
420+
it("should use most recent status_set when loading multiple historical messages", () => {
421+
const aggregator = new StreamingMessageAggregator("2024-01-01T00:00:00.000Z");
422+
423+
// Create historical messages with multiple status_set calls
424+
const historicalMessages = [
425+
{
426+
id: "msg1",
427+
role: "assistant" as const,
428+
parts: [
429+
{
430+
type: "dynamic-tool" as const,
431+
toolCallId: "tool1",
432+
toolName: "status_set",
433+
state: "output-available" as const,
434+
input: { emoji: "🔍", message: "First status" },
435+
output: { success: true, emoji: "🔍", message: "First status" },
436+
timestamp: Date.now(),
437+
},
438+
],
439+
metadata: { timestamp: Date.now(), historySequence: 1 },
440+
},
441+
{
442+
id: "msg2",
443+
role: "assistant" as const,
444+
parts: [
445+
{
446+
type: "dynamic-tool" as const,
447+
toolCallId: "tool2",
448+
toolName: "status_set",
449+
state: "output-available" as const,
450+
input: { emoji: "📝", message: "Second status" },
451+
output: { success: true, emoji: "📝", message: "Second status" },
452+
timestamp: Date.now(),
453+
},
454+
],
455+
metadata: { timestamp: Date.now(), historySequence: 2 },
456+
},
457+
];
458+
459+
// Load historical messages
460+
aggregator.loadHistoricalMessages(historicalMessages);
461+
462+
// Should use the most recent (last processed) status
463+
const status = aggregator.getAgentStatus();
464+
expect(status?.emoji).toBe("📝");
465+
expect(status?.message).toBe("Second status");
466+
});
467+
468+
it("should not reconstruct status from failed status_set in historical messages", () => {
469+
const aggregator = new StreamingMessageAggregator("2024-01-01T00:00:00.000Z");
470+
471+
// Create historical message with failed status_set
472+
const historicalMessages = [
473+
{
474+
id: "msg1",
475+
role: "assistant" as const,
476+
parts: [
477+
{
478+
type: "dynamic-tool" as const,
479+
toolCallId: "tool1",
480+
toolName: "status_set",
481+
state: "output-available" as const,
482+
input: { emoji: "not-emoji", message: "test" },
483+
output: { success: false, error: "emoji must be a single emoji character" },
484+
timestamp: Date.now(),
485+
},
486+
],
487+
metadata: { timestamp: Date.now(), historySequence: 1 },
488+
},
489+
];
490+
491+
// Load historical messages
492+
aggregator.loadHistoricalMessages(historicalMessages);
493+
494+
// Status should remain undefined (failed validation)
258495
expect(aggregator.getAgentStatus()).toBeUndefined();
259496
});
260497
});

0 commit comments

Comments
 (0)