🤖 Improve integration test diagnostics for flaky tool policy tests #46

ammario · 2025-10-06T16:13:25Z

Problem

The tool policy integration tests are flaky in CI, timing out waiting for stream-end events with no useful diagnostic information. When they fail, we only see:

Expected: true
Received: false
  at assertStreamSuccess (tests/ipcMain/helpers.ts:191:36)

This tells us nothing about what actually went wrong.

Solution

This PR makes general improvements to test helpers that benefit all integration tests:

1. Enhanced `waitForEvent` helper

Automatically logs stream-error details when timing out
Shows error message and error type for debugging
Makes all integration tests more debuggable

2. Enhanced `assertStreamSuccess` helper

Replaced generic expect() calls with descriptive Error messages
Shows all collected events when any assertion fails
Distinguishes between different failure modes:
- Stream didn't complete (no stream-end)
- Stream errored (has stream-error)
- Stream completed but missing final message
Includes the actual error message in the failure output

3. Fixed tool policy tests to wait for completion

Now wait for either stream-end OR stream-error (prevents timeout)
Rely on improved helpers for diagnostics
Much simpler and more maintainable

Impact

These changes benefit all integration tests, not just tool policy tests. Any test using waitForEvent or assertStreamSuccess will now provide much better error messages when failing.

Next Steps

Once this PR is merged and we see the tests fail in CI, we'll get detailed error logs that will help us:

Identify if the AI is trying to use disabled tools
See the actual API errors if any
Understand the sequence of events leading to failure
Fix the root cause instead of masking it with timeouts

Generated with cmux

Instead of timing out silently, the tests now: - Wait for either stream-end OR stream-error - Log all collected events for debugging - Log the actual error message and error type if stream fails - Fail with a clear message pointing to the logs This will help us understand what's actually failing when the tests are flaky in CI (e.g., is the AI trying to use disabled tools? Is there an API error? etc.) _Generated with `cmux`_

Enhanced and to provide better error messages when tests fail: **waitForEvent improvements:** - When timing out, now automatically logs stream-error details if present - Shows error message and error type for debugging **assertStreamSuccess improvements:** - Replaced generic expect() calls with descriptive Error messages - Shows all collected events when any assertion fails - Distinguishes between different failure modes: - Stream didn't complete (no stream-end) - Stream errored (has stream-error) - Stream completed but missing final message - Includes the actual error message in the failure output This makes test failures in CI much easier to diagnose without needing to add test-specific logging everywhere. _Generated with `cmux`_

Now that waitForEvent and assertStreamSuccess provide detailed diagnostics, we can remove the test-specific logging code. The tests now: - Wait for either stream-end or stream-error (prevents timeout) - Rely on helper functions for error diagnostics - Are much more readable and maintainable All the diagnostic logging happens automatically in the helpers, making it a general fix for all integration tests, not just these two. _Generated with `cmux`_

ammario added 3 commits October 6, 2025 11:18

ammario force-pushed the fix-tool-policy-test-flakiness branch from d2e6dcb to d343a40 Compare October 6, 2025 16:18

Merge branch 'main' into fix-tool-policy-test-flakiness

0f8bb01

ammario enabled auto-merge (squash) October 6, 2025 17:25

ammario merged commit ee62b00 into main Oct 6, 2025
6 checks passed

ammario deleted the fix-tool-policy-test-flakiness branch October 6, 2025 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 Improve integration test diagnostics for flaky tool policy tests #46

🤖 Improve integration test diagnostics for flaky tool policy tests #46

Uh oh!

ammario commented Oct 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🤖 Improve integration test diagnostics for flaky tool policy tests #46

🤖 Improve integration test diagnostics for flaky tool policy tests #46

Uh oh!

Conversation

ammario commented Oct 6, 2025

Problem

Solution

1. Enhanced waitForEvent helper

2. Enhanced assertStreamSuccess helper

3. Fixed tool policy tests to wait for completion

Impact

Next Steps

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Enhanced `waitForEvent` helper

2. Enhanced `assertStreamSuccess` helper