fix(deployments): retry transient depot build init failures#2586
fix(deployments): retry transient depot build init failures#2586
Conversation
The Depot build init with `depot.build.v1.BuildService.createBuild` fails surprisingly often due to transient error. This PR adds a simple retry mechanism with backoff using `p-retry`.
|
Walkthrough
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
apps/webapp/app/v3/remoteImageBuilder.server.ts (1)
18-36: Consider adding AbortError for non-retryable errors.The retry logic is sound, but retrying all errors indiscriminately may waste resources on non-transient failures. Authentication failures (401, 403) and validation errors (400, 422) should abort immediately rather than retry.
Additionally, the
onFailedAttemptcallback could logattemptNumberandretriesLeftfor better observability.Apply this diff to handle non-retryable errors and improve logging:
import pRetry from "p-retry"; +import { AbortError } from "p-retry"; import { logger } from "~/services/logger.server";const result = await pRetry( - () => + async () => { + try { + return await depot.build.v1.BuildService.createBuild( - depot.build.v1.BuildService.createBuild( { projectId: builderProjectId }, { headers: { Authorization: `Bearer ${env.DEPOT_TOKEN}`, }, } ); + } catch (error: any) { + // Don't retry authentication or validation errors + if (error.code === "UNAUTHENTICATED" || error.code === "PERMISSION_DENIED" || error.code === "INVALID_ARGUMENT") { + throw new AbortError(error); + } + throw error; + } + }, { retries: 3, minTimeout: 200, maxTimeout: 2000, onFailedAttempt: (error) => { - logger.error("Failed attempt to create remote Depot build", { error }); + logger.error("Failed attempt to create remote Depot build", { + error, + attemptNumber: error.attemptNumber, + retriesLeft: error.retriesLeft, + }); }, } );Note: Adjust the error code checks based on the actual error structure returned by the Depot SDK. You may need to inspect
error.response?.statusfor HTTP status codes instead.Based on learnings.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (3)
apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts(1 hunks)apps/webapp/app/v3/remoteImageBuilder.server.ts(2 hunks)apps/webapp/package.json(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Always prefer using isomorphic code like fetch, ReadableStream, etc. instead of Node.js specific code
For TypeScript, we usually use types over interfaces
Avoid enums
No default exports, use function declarations
Files:
apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.tsapps/webapp/app/v3/remoteImageBuilder.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
We use zod a lot in packages/core and in the webapp
Files:
apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.tsapps/webapp/app/v3/remoteImageBuilder.server.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
When importing from @trigger.dev/core in the webapp, never import the root package path; always use one of the documented subpath exports from @trigger.dev/core’s package.json
Files:
apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.tsapps/webapp/app/v3/remoteImageBuilder.server.ts
{apps/webapp/app/**/*.server.{ts,tsx},apps/webapp/app/routes/**/*.ts}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Access environment variables only via the env export from app/env.server.ts; do not reference process.env directly
Files:
apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.tsapps/webapp/app/v3/remoteImageBuilder.server.ts
apps/webapp/app/**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Modules intended for test consumption under apps/webapp/app/**/*.ts must not read environment variables; accept configuration via options instead
Files:
apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.tsapps/webapp/app/v3/remoteImageBuilder.server.ts
🧬 Code graph analysis (2)
apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts (1)
packages/core/src/v3/apps/http.ts (1)
json(65-75)
apps/webapp/app/v3/remoteImageBuilder.server.ts (1)
apps/webapp/app/env.server.ts (1)
env(1203-1203)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (5)
apps/webapp/package.json (1)
169-169: LGTM! Dependency addition supports retry logic.The
p-retry@^4.6.1dependency is used inapps/webapp/app/v3/remoteImageBuilder.server.tsto handle transient Depot build failures. The version is stable and the caret range allows safe updates within v4.x.apps/webapp/app/v3/remoteImageBuilder.server.ts (2)
6-7: LGTM! Imports correctly support retry and logging.The imports follow the correct pattern:
pRetryis imported as a default export (as per p-retry v4 API), andloggerfollows the project's logging pattern.
23-25: LGTM! Authorization header correctly added.The Authorization header follows the same pattern used in
createBuilderProjectIfNotExists(lines 57-59) and correctly usesenv.DEPOT_TOKENas per coding guidelines.apps/webapp/app/routes/api.v1.deployments.$deploymentId.progress.ts (2)
58-61: LGTM! Logging improves observability for timeout extensions.The warning log with
error.causeprovides useful debugging context while maintaining the 204 response behavior. Adding braces also improves maintainability if additional statements are needed later.
69-72: LGTM! Logging improves observability for remote build failures.The error log with
error.causeprovides useful debugging context for Depot build creation failures. The 500 response correctly signals a server error while maintaining a generic client-facing message.
The Depot build init with
depot.build.v1.BuildService.createBuildfails surprisingly often due to transient errors, causing the whole deployment to fail. This PR adds a simple retry mechanism with backoff usingp-retry. This should improve the failure rate.