Skip to content

Commit ee62b00

Browse files
authored
🤖 Improve integration test diagnostics for flaky tool policy tests (#46)
## Problem The tool policy integration tests are flaky in CI, timing out waiting for `stream-end` events with no useful diagnostic information. When they fail, we only see: ``` Expected: true Received: false at assertStreamSuccess (tests/ipcMain/helpers.ts:191:36) ``` This tells us nothing about what actually went wrong. ## Solution This PR makes **general improvements to test helpers** that benefit all integration tests: ### 1. Enhanced `waitForEvent` helper - Automatically logs stream-error details when timing out - Shows error message and error type for debugging - Makes all integration tests more debuggable ### 2. Enhanced `assertStreamSuccess` helper - Replaced generic expect() calls with descriptive Error messages - Shows all collected events when any assertion fails - Distinguishes between different failure modes: - Stream didn't complete (no stream-end) - Stream errored (has stream-error) - Stream completed but missing final message - Includes the actual error message in the failure output ### 3. Fixed tool policy tests to wait for completion - Now wait for either `stream-end` OR `stream-error` (prevents timeout) - Rely on improved helpers for diagnostics - Much simpler and more maintainable ## Impact These changes benefit **all integration tests**, not just tool policy tests. Any test using `waitForEvent` or `assertStreamSuccess` will now provide much better error messages when failing. ## Next Steps Once this PR is merged and we see the tests fail in CI, we'll get detailed error logs that will help us: - Identify if the AI is trying to use disabled tools - See the actual API errors if any - Understand the sequence of events leading to failure - Fix the root cause instead of masking it with timeouts _Generated with `cmux`_
1 parent 4006e3f commit ee62b00

File tree

2 files changed

+61
-5
lines changed

2 files changed

+61
-5
lines changed

tests/ipcMain/helpers.ts

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,15 @@ export class EventCollector {
142142
`waitForEvent timeout: Expected "${eventType}" but got events: [${eventTypes.join(", ")}]`
143143
);
144144

145+
// If there was a stream-error, log the error details
146+
const errorEvent = this.events.find((e) => "type" in e && e.type === "stream-error");
147+
if (errorEvent && "error" in errorEvent) {
148+
console.error("Stream error details:", errorEvent.error);
149+
if ("errorType" in errorEvent) {
150+
console.error("Stream error type:", errorEvent.errorType);
151+
}
152+
}
153+
145154
return null;
146155
}
147156

@@ -186,12 +195,43 @@ export function createEventCollector(
186195

187196
/**
188197
* Assert that a stream completed successfully
198+
* Provides helpful error messages when assertions fail
189199
*/
190200
export function assertStreamSuccess(collector: EventCollector): void {
191-
expect(collector.hasStreamEnd()).toBe(true);
192-
expect(collector.hasError()).toBe(false);
201+
const allEvents = collector.getEvents();
202+
const eventTypes = allEvents.filter((e) => "type" in e).map((e) => (e as { type: string }).type);
203+
204+
// Check for stream-end
205+
if (!collector.hasStreamEnd()) {
206+
const errorEvent = allEvents.find((e) => "type" in e && e.type === "stream-error");
207+
if (errorEvent && "error" in errorEvent) {
208+
throw new Error(
209+
`Stream did not complete successfully. Got stream-error: ${errorEvent.error}\n` +
210+
`All events: [${eventTypes.join(", ")}]`
211+
);
212+
}
213+
throw new Error(
214+
`Stream did not emit stream-end event.\n` + `All events: [${eventTypes.join(", ")}]`
215+
);
216+
}
217+
218+
// Check for errors
219+
if (collector.hasError()) {
220+
const errorEvent = allEvents.find((e) => "type" in e && e.type === "stream-error");
221+
const errorMsg = errorEvent && "error" in errorEvent ? errorEvent.error : "unknown";
222+
throw new Error(
223+
`Stream completed but also has error event: ${errorMsg}\n` +
224+
`All events: [${eventTypes.join(", ")}]`
225+
);
226+
}
227+
228+
// Check for final message
193229
const finalMessage = collector.getFinalMessage();
194-
expect(finalMessage).toBeDefined();
230+
if (!finalMessage) {
231+
throw new Error(
232+
`Stream completed but final message is missing.\n` + `All events: [${eventTypes.join(", ")}]`
233+
);
234+
}
195235
}
196236

197237
/**

tests/ipcMain/sendMessage.test.ts

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -833,7 +833,15 @@ describeIntegration("IpcMain sendMessage integration tests", () => {
833833

834834
// Wait for stream to complete (longer timeout for tool policy tests)
835835
const collector = createEventCollector(env.sentEvents, workspaceId);
836-
await collector.waitForEvent("stream-end", 30000);
836+
837+
// Wait for either stream-end or stream-error
838+
// (helpers will log diagnostic info on failure)
839+
await Promise.race([
840+
collector.waitForEvent("stream-end", 30000),
841+
collector.waitForEvent("stream-error", 30000),
842+
]);
843+
844+
// This will throw with detailed error info if stream didn't complete successfully
837845
assertStreamSuccess(collector);
838846

839847
// Verify file still exists (bash tool was disabled, so deletion shouldn't have happened)
@@ -884,7 +892,15 @@ describeIntegration("IpcMain sendMessage integration tests", () => {
884892

885893
// Wait for stream to complete (longer timeout for tool policy tests)
886894
const collector = createEventCollector(env.sentEvents, workspaceId);
887-
await collector.waitForEvent("stream-end", 30000);
895+
896+
// Wait for either stream-end or stream-error
897+
// (helpers will log diagnostic info on failure)
898+
await Promise.race([
899+
collector.waitForEvent("stream-end", 30000),
900+
collector.waitForEvent("stream-error", 30000),
901+
]);
902+
903+
// This will throw with detailed error info if stream didn't complete successfully
888904
assertStreamSuccess(collector);
889905

890906
// Verify file content unchanged (file_edit tools and bash were disabled)

0 commit comments

Comments
 (0)