fix(deployments): retry transient depot build init failures by myftija · Pull Request #2586 · triggerdotdev/trigger.dev

myftija · 2025-10-06T09:13:15Z

The Depot build init with depot.build.v1.BuildService.createBuild fails surprisingly often due to transient errors, causing the whole deployment to fail. This PR adds a simple retry mechanism with backoff using p-retry. This should improve the failure rate.

The Depot build init with `depot.build.v1.BuildService.createBuild` fails surprisingly often due to transient error. This PR adds a simple retry mechanism with backoff using `p-retry`.

changeset-bot · 2025-10-06T09:13:19Z

⚠️ No Changeset found

Latest commit: 0edcc7d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2025-10-06T09:13:47Z

Walkthrough

Adds logging and braces for two error-handling cases in the deployment progress route: logs a warning on "failed_to_extend_deployment_timeout" while keeping a 204 response; logs an error on "failed_to_create_remote_build" and returns a 500 with a generic message.
Adds retry logic around remote build creation using p-retry with configured retries and timeouts, including onFailedAttempt logging.
Adds an Authorization header to the remote build request.
Imports a logger and p-retry in the remote image builder module.
Adds p-retry (^4.6.1) as a runtime dependency in apps/webapp/package.json.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description does not follow the repository’s required template and is missing multiple sections including the Closes statement, the ✅ Checklist, Testing steps, Changelog summary, and Screenshots placeholder, making it incomplete for reviewers. Without these elements the description lacks structured context, validation steps, and tracking information. Therefore it fails to meet the description template requirements.	Please update the PR description to include the “Closes #” reference, complete the checklist items, add a Testing section with the steps you performed, include a Changelog entry summarizing the change, and provide Screenshots if applicable.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly and accurately describes the primary change by indicating that deployment logic now retries transient failures when initializing Depot builds, and it follows the conventional commit style without extraneous details. It is clear, concise, and directly related to the main change in the pull request.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-retry-depot-builds

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

apps/webapp/app/v3/remoteImageBuilder.server.ts (1)

18-36: Consider adding AbortError for non-retryable errors.

The retry logic is sound, but retrying all errors indiscriminately may waste resources on non-transient failures. Authentication failures (401, 403) and validation errors (400, 422) should abort immediately rather than retry.

Additionally, the onFailedAttempt callback could log attemptNumber and retriesLeft for better observability.

Apply this diff to handle non-retryable errors and improve logging:

 import pRetry from "p-retry";
+import { AbortError } from "p-retry";
 import { logger } from "~/services/logger.server";

   const result = await pRetry(
-    () =>
+    async () => {
+      try {
+        return await depot.build.v1.BuildService.createBuild(
-      depot.build.v1.BuildService.createBuild(
           { projectId: builderProjectId },
           {
             headers: {
               Authorization: `Bearer ${env.DEPOT_TOKEN}`,
             },
           }
         );
+      } catch (error: any) {
+        // Don't retry authentication or validation errors
+        if (error.code === "UNAUTHENTICATED" || error.code === "PERMISSION_DENIED" || error.code === "INVALID_ARGUMENT") {
+          throw new AbortError(error);
+        }
+        throw error;
+      }
+    },
     {
       retries: 3,
       minTimeout: 200,
       maxTimeout: 2000,
       onFailedAttempt: (error) => {
-        logger.error("Failed attempt to create remote Depot build", { error });
+        logger.error("Failed attempt to create remote Depot build", {
+          error,
+          attemptNumber: error.attemptNumber,
+          retriesLeft: error.retriesLeft,
+        });
       },
     }
   );

Note: Adjust the error code checks based on the actual error structure returned by the Depot SDK. You may need to inspect error.response?.status for HTTP status codes instead.

Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b90f3e2 and 0edcc7d.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (3)

apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts (1 hunks)
apps/webapp/app/v3/remoteImageBuilder.server.ts (2 hunks)
apps/webapp/package.json (1 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations

Files:

apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts
apps/webapp/app/v3/remoteImageBuilder.server.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

We use zod a lot in packages/core and in the webapp

Files:

apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts
apps/webapp/app/v3/remoteImageBuilder.server.ts

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

When importing from @trigger.dev/core in the webapp, never import the root package path; always use one of the documented subpath exports from @trigger.dev/core’s package.json

Files:

apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts
apps/webapp/app/v3/remoteImageBuilder.server.ts

{apps/webapp/app/**/*.server.{ts,tsx},apps/webapp/app/routes/**/*.ts}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access environment variables only via the env export from app/env.server.ts; do not reference process.env directly

Files:

apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts
apps/webapp/app/v3/remoteImageBuilder.server.ts

apps/webapp/app/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Modules intended for test consumption under apps/webapp/app/**/*.ts must not read environment variables; accept configuration via options instead

Files:

apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts
apps/webapp/app/v3/remoteImageBuilder.server.ts

🧬 Code graph analysis (2)

apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts (1)

packages/core/src/v3/apps/http.ts (1)

json (65-75)

apps/webapp/app/v3/remoteImageBuilder.server.ts (1)

apps/webapp/app/env.server.ts (1)

env (1203-1203)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)

GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: typecheck / typecheck
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (5)

apps/webapp/package.json (1)

169-169: LGTM! Dependency addition supports retry logic.

The p-retry@^4.6.1 dependency is used in apps/webapp/app/v3/remoteImageBuilder.server.ts to handle transient Depot build failures. The version is stable and the caret range allows safe updates within v4.x.

apps/webapp/app/v3/remoteImageBuilder.server.ts (2)

6-7: LGTM! Imports correctly support retry and logging.

The imports follow the correct pattern: pRetry is imported as a default export (as per p-retry v4 API), and logger follows the project's logging pattern.

23-25: LGTM! Authorization header correctly added.

The Authorization header follows the same pattern used in createBuilderProjectIfNotExists (lines 57-59) and correctly uses env.DEPOT_TOKEN as per coding guidelines.

apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts (2)

58-61: LGTM! Logging improves observability for timeout extensions.

The warning log with error.cause provides useful debugging context while maintaining the 204 response behavior. Adding braces also improves maintainability if additional statements are needed later.

69-72: LGTM! Logging improves observability for remote build failures.

The error log with error.cause provides useful debugging context for Depot build creation failures. The 500 response correctly signals a server error while maintaining a generic client-facing message.

fix(deployments): retry transient depot build init failures

0edcc7d

The Depot build init with `depot.build.v1.BuildService.createBuild` fails surprisingly often due to transient error. This PR adds a simple retry mechanism with backoff using `p-retry`.

nicktrn approved these changes Oct 6, 2025

View reviewed changes

coderabbitai bot reviewed Oct 6, 2025

View reviewed changes

ericallam approved these changes Oct 6, 2025

View reviewed changes

myftija merged commit 107f4dc into main Oct 6, 2025
31 checks passed

myftija deleted the fix-retry-depot-builds branch October 6, 2025 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

fix(deployments): retry transient depot build init failures#2586

fix(deployments): retry transient depot build init failures#2586
myftija merged 1 commit intomainfrom
fix-retry-depot-builds

myftija commented Oct 6, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Oct 6, 2025

Uh oh!

coderabbitai bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Comments

Conversation

myftija commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Oct 6, 2025

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

myftija commented Oct 6, 2025 •

edited

Loading

coderabbitai bot commented Oct 6, 2025 •

edited

Loading