Skip to content

Conversation

@nicktrn
Copy link
Collaborator

@nicktrn nicktrn commented Jan 30, 2026

Summary

  • When a child process crashes and a retry (RETRY_IMMEDIATELY) is attempted on the same TaskRunProcess, execute() hangs forever because the IPC send is silently skipped and the attempt promise can never resolve
  • This caused runner pods to stay up indefinitely with no heartbeats or polls
  • Fix: reject the attempt promise immediately when the child is not connected, so the controller can proceed to warm start or exit

Test plan

  • Added taskRunProcess.test.ts — verifies execute() rejects promptly instead of hanging when the child process is dead
  • Deploy and verify no more stuck runner pods accumulate over time

Open with Devin

When a child process crashes and a retry is attempted on the same
TaskRunProcess, execute() would hang forever because the IPC send
was silently skipped and the attempt promise could never resolve.
This caused runner pods to stay up indefinitely with no heartbeats.
@changeset-bot
Copy link

changeset-bot bot commented Jan 30, 2026

🦋 Changeset detected

Latest commit: f3049f6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 28 packages
Name Type
trigger.dev Patch
d3-chat Patch
references-d3-openai-agents Patch
references-nextjs-realtime Patch
references-realtime-hooks-test Patch
references-realtime-streams Patch
references-telemetry Patch
@trigger.dev/build Patch
@trigger.dev/core Patch
@trigger.dev/python Patch
@trigger.dev/react-hooks Patch
@trigger.dev/redis-worker Patch
@trigger.dev/rsc Patch
@trigger.dev/schema-to-json Patch
@trigger.dev/sdk Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/redis Patch
@internal/replication Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/tsql Patch
@internal/zod-worker Patch
@internal/sdk-compat-tests Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

Walkthrough

This pull request fixes a hang issue when execute() is called on a dead child process in TaskRunProcess. A new check is added to the execute method that detects when the IPC channel to the child process is not connected and immediately rejects the pending attempt with an UnexpectedExitError, marking the attempt status as REJECTED. A corresponding test validates this behavior and ensures the code does not hang in this scenario.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The pull request description provides a clear problem statement, solution, and test plan, but is missing several required sections from the template. Add the missing required sections: Checklist (with required items), Testing (detailed test steps), and Changelog. The description should follow the provided template structure more closely.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main fix: rejecting execute() immediately when the child process is dead, which directly addresses the core issue in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/dead-process-execute-hang

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vibe-kanban-cloud
Copy link

Review Complete

Your review story is ready!

View Story

Comment !reviewfast on this PR to re-generate the story.

@nicktrn nicktrn requested a review from Copilot January 30, 2026 16:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where runner pods would hang indefinitely when attempting to retry a task execution on a crashed child process. The execute() method would silently skip the IPC send to the dead process but never resolve or reject its attempt promise, causing the runner to stop processing work without exiting.

Changes:

  • Modified TaskRunProcess.execute() to immediately reject the attempt promise when the child process is not connected
  • Added comprehensive test coverage to verify the fix prevents hanging behavior
  • Added changeset documenting the patch

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
packages/cli-v3/src/executions/taskRunProcess.ts Added else branch to reject attempt promise when child process is disconnected
packages/cli-v3/src/executions/taskRunProcess.test.ts New test file verifying execute() rejects promptly instead of hanging on dead processes
.changeset/fix-dead-process-execute-hang.md Changeset documenting the bug fix

@nicktrn nicktrn marked this pull request as ready for review January 30, 2026 16:38
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional flags.

Open in Devin Review

@nicktrn nicktrn merged commit 279102c into main Jan 30, 2026
40 checks passed
@nicktrn nicktrn deleted the fix/dead-process-execute-hang branch January 30, 2026 16:44
NERLOE pushed a commit to NERLOE/trigger.dev that referenced this pull request Jan 30, 2026
…iggerdotdev#2978)

## Summary
- When a child process crashes and a retry (`RETRY_IMMEDIATELY`) is
attempted on the same `TaskRunProcess`, `execute()` hangs forever
because the IPC send is silently skipped and the attempt promise can
never resolve
- This caused runner pods to stay up indefinitely with no heartbeats or
polls
- Fix: reject the attempt promise immediately when the child is not
connected, so the controller can proceed to warm start or exit

## Test plan
- [x] Added `taskRunProcess.test.ts` — verifies `execute()` rejects
promptly instead of hanging when the child process is dead
- [x] Deploy and verify no more stuck runner pods accumulate over time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants