Skip to content

Conversation

@ammario
Copy link
Member

@ammario ammario commented Oct 6, 2025

Problem

The tool policy integration tests are flaky in CI, timing out waiting for stream-end events with no useful diagnostic information. When they fail, we only see:

Expected: true
Received: false
  at assertStreamSuccess (tests/ipcMain/helpers.ts:191:36)

This tells us nothing about what actually went wrong.

Solution

This PR makes general improvements to test helpers that benefit all integration tests:

1. Enhanced waitForEvent helper

  • Automatically logs stream-error details when timing out
  • Shows error message and error type for debugging
  • Makes all integration tests more debuggable

2. Enhanced assertStreamSuccess helper

  • Replaced generic expect() calls with descriptive Error messages
  • Shows all collected events when any assertion fails
  • Distinguishes between different failure modes:
    • Stream didn't complete (no stream-end)
    • Stream errored (has stream-error)
    • Stream completed but missing final message
  • Includes the actual error message in the failure output

3. Fixed tool policy tests to wait for completion

  • Now wait for either stream-end OR stream-error (prevents timeout)
  • Rely on improved helpers for diagnostics
  • Much simpler and more maintainable

Impact

These changes benefit all integration tests, not just tool policy tests. Any test using waitForEvent or assertStreamSuccess will now provide much better error messages when failing.

Next Steps

Once this PR is merged and we see the tests fail in CI, we'll get detailed error logs that will help us:

  • Identify if the AI is trying to use disabled tools
  • See the actual API errors if any
  • Understand the sequence of events leading to failure
  • Fix the root cause instead of masking it with timeouts

Generated with cmux

Instead of timing out silently, the tests now:
- Wait for either stream-end OR stream-error
- Log all collected events for debugging
- Log the actual error message and error type if stream fails
- Fail with a clear message pointing to the logs

This will help us understand what's actually failing when the tests
are flaky in CI (e.g., is the AI trying to use disabled tools? Is
there an API error? etc.)

_Generated with `cmux`_
Enhanced  and  to provide better
error messages when tests fail:

**waitForEvent improvements:**
- When timing out, now automatically logs stream-error details if present
- Shows error message and error type for debugging

**assertStreamSuccess improvements:**
- Replaced generic expect() calls with descriptive Error messages
- Shows all collected events when any assertion fails
- Distinguishes between different failure modes:
  - Stream didn't complete (no stream-end)
  - Stream errored (has stream-error)
  - Stream completed but missing final message
- Includes the actual error message in the failure output

This makes test failures in CI much easier to diagnose without needing
to add test-specific logging everywhere.

_Generated with `cmux`_
Now that waitForEvent and assertStreamSuccess provide detailed
diagnostics, we can remove the test-specific logging code.

The tests now:
- Wait for either stream-end or stream-error (prevents timeout)
- Rely on helper functions for error diagnostics
- Are much more readable and maintainable

All the diagnostic logging happens automatically in the helpers,
making it a general fix for all integration tests, not just these two.

_Generated with `cmux`_
@ammario ammario force-pushed the fix-tool-policy-test-flakiness branch from d2e6dcb to d343a40 Compare October 6, 2025 16:18
@ammario ammario enabled auto-merge (squash) October 6, 2025 17:25
@ammario ammario merged commit ee62b00 into main Oct 6, 2025
6 checks passed
@ammario ammario deleted the fix-tool-policy-test-flakiness branch October 6, 2025 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant