You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🤖 fix: improve status_set tool description and display logic (#466)
## Overview
Fixes two critical bugs with the `status_set` tool and refines
completion status examples.
## Problems Fixed
### 1. Validation failures showed as 'completed' ✓ instead of 'failed' ✗
When `status_set` validation failed (e.g., invalid emoji), the tool
displayed 'completed' status even though it failed. This made users
think validation was silently failing, especially when the status didn't
appear in the sidebar.
**Root cause:** Tool status determination only checked `part.state ===
'output-available'` (meaning the tool returned a result), but didn't
check whether that result indicated success or failure via
`result.success`.
### 2. Status didn't persist across page reloads or workspace switches
Workspaces with successful `status_set` calls would lose their status
after reload because `loadHistoricalMessages()` didn't reconstruct
derived state from tool results.
**Root cause:** Derived state (agentStatus, currentTodos) was only
updated during live streaming events, not when loading historical
messages.
## Solutions
### 1. Show 'failed' status for validation errors
- Enhanced status determination to check `result.success` for tools
returning `{ success: boolean }`
- Extracted `hasSuccessResult()` and `hasFailureResult()` helpers to
eliminate duplication
- Display error message in StatusSetToolCall UI when validation fails
### 2. Shared tool result processing
Extracted `processToolResult()` as single source of truth for updating
derived state. Called from:
1. **`handleToolCallEnd()`** - live streaming events
2. **`loadHistoricalMessages()`** - historical message loading
This ensures agentStatus and currentTodos are reconstructed uniformly
whether processing live events or historical snapshots.
### 3. Refined completion status examples
Replaced generic examples with outcome-specific ones that show variety:
- ✅ Success: 'PR checks pass and ready to merge'
- ❌ Failure: 'CreateWorkspace Tests failed'
- ⚠️ Warning: 'Encountered serious issue with design'
Encourages agents to communicate actual outcomes, not just completion.
## Implementation Details
**Status determination (StreamingMessageAggregator.ts:772-789):**
```typescript
if (part.state === "output-available") {
status = hasFailureResult(part.output) ? "failed" : "completed";
}
```
**Shared result processing (StreamingMessageAggregator.ts:530-552):**
```typescript
private processToolResult(toolName: string, input: unknown, output: unknown): void {
if (toolName === "todo_write" && hasSuccessResult(output)) {
// Update todos...
}
if (toolName === "status_set" && hasSuccessResult(output)) {
// Update agentStatus...
}
}
```
**Historical message processing
(StreamingMessageAggregator.ts:199-213):**
```typescript
loadHistoricalMessages(messages: CmuxMessage[]): void {
for (const message of messages) {
this.messages.set(message.id, message);
if (message.role === "assistant") {
for (const part of message.parts) {
if (isDynamicToolPart(part) && part.state === "output-available") {
this.processToolResult(part.toolName, part.input, part.output);
}
}
}
}
}
```
## Testing
- Added 5 new unit tests for status persistence and validation display
- All 842 tests pass
- Typecheck and lint pass
**New tests:**
1. Show 'failed' status when validation fails
2. Show 'completed' status when validation succeeds
3. Reconstruct agentStatus from historical messages
4. Use most recent status when loading multiple messages
5. Don't reconstruct from failed status_set
## Before/After
### Validation Failure Display
**Before:**
```
status_set({ emoji: "not-emoji", message: "test" })
→ Shows: completed ✓
→ Sidebar: No status appears
→ User: "Is validation failing silently?" 🤔
```
**After:**
```
status_set({ emoji: "not-emoji", message: "test" })
→ Shows: failed ✗ (emoji must be a single emoji character)
→ Sidebar: No status appears (correct - validation failed)
→ User: Clear feedback! ✨
```
### Status Persistence
**Before:**
```
1. Agent sets status: "🔍 Analyzing code"
2. Reload page or switch workspace
3. Status disappears (even though it was successful)
```
**After:**
```
1. Agent sets status: "🔍 Analyzing code"
2. Reload page or switch workspace
3. Status persists ✅
```
## Architecture Benefits
1. **Single source of truth**: One method processes tool results,
regardless of source
2. **Leverages existing data**: Tool parts already contain everything we
need
3. **No special reconstruction logic**: Just process parts uniformly
4. **Easier to extend**: Adding new derived state only requires updating
`processToolResult()`
5. **Matches architecture**: Historical messages are complete snapshots
_Generated with `cmux`_
0 commit comments