fix(webapp): stop writer DB connectivity errors leaking to trigger() API clients#3874
Conversation
…API clients
During trigger() worker-queue resolution, getWorkerQueue wraps any error from
getDefaultWorkerGroupForProject into a client-facing ServiceValidationError
(HTTP 422) carrying error.message. That method runs project.findFirst on the
*writer*; when the writer is unreachable Prisma throws P1001 ("Can't reach
database server at <host>"), and its raw message — including the DB hostname —
was echoed to the API client and surfaced in the customer's run view via the
SDK's TriggerApiError.
This also mis-classifies a transient outage: a 422 is not retried by the SDK,
so triggers failed permanently instead of riding out a brief writer blip.
Add isInfrastructureError() (Prisma connectivity codes P1001/P1002/P1008/P1017
plus init/panic/unknown classes) and, at the wrap site, rethrow infrastructure
errors so they hit the route's generic 500 handler (scrubbed + retryable);
only genuine domain failures (e.g. "Project not found.") become a 422.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
🧰 Additional context used📓 Path-based instructions (11)**/*.{ts,tsx}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
{packages/core,apps/webapp}/**/*.{ts,tsx}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
**/*.ts📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
Files:
apps/webapp/**/*.{ts,tsx}📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Files:
apps/webapp/**/*.server.ts📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)
Files:
**/*.{js,ts,tsx,jsx,css,json,md}📄 CodeRabbit inference engine (AGENTS.md)
Files:
**/*.{test,spec}.{ts,tsx}📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Files:
apps/webapp/**/*.test.{ts,tsx}📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Files:
**/*.test.{ts,tsx}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.test.{js,ts,tsx}📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (11)📚 Learning: 2026-05-14T14:54:39.095ZApplied to files:
📚 Learning: 2026-03-22T13:26:12.060ZApplied to files:
📚 Learning: 2026-03-22T19:24:14.403ZApplied to files:
📚 Learning: 2026-05-18T08:21:27.694ZApplied to files:
📚 Learning: 2026-05-18T08:21:27.694ZApplied to files:
📚 Learning: 2026-05-05T09:38:02.512ZApplied to files:
📚 Learning: 2026-05-12T21:04:05.815ZApplied to files:
📚 Learning: 2026-06-04T18:16:35.386ZApplied to files:
📚 Learning: 2026-05-07T12:25:18.271ZApplied to files:
📚 Learning: 2026-05-28T20:02:10.647ZApplied to files:
📚 Learning: 2026-05-18T14:40:02.173ZApplied to files:
🔇 Additional comments (11)
WalkthroughThis PR prevents database infrastructure errors from leaking to API clients. It introduces a utility function that classifies Prisma errors as infrastructure-level (unreachable database, timeouts, connection failures) or application-level. The queue manager's error handling is updated to re-throw infrastructure errors unchanged so they reach the generic 500 handler, while application errors remain wrapped in domain-specific responses. Tests verify both the classification logic and the updated error handling path. A changelog entry documents the fix. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
@trigger.dev/build
trigger.dev
@trigger.dev/core
@trigger.dev/plugins
@trigger.dev/python
@trigger.dev/react-hooks
@trigger.dev/redis-worker
@trigger.dev/rsc
@trigger.dev/schema-to-json
@trigger.dev/sdk
commit: |
It imported queues.server.ts, which transitively starts DB/Redis-touching singletons; in the no-infra unit shard those reject as unhandled rejections and fail the run (passed locally only because docker services were up). The guard logic is covered by prismaErrors.test.ts; the full HTTP path belongs in a toxiproxy e2e, not the unit shard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
During
trigger()worker-queue resolution,getWorkerQueuewrapped any error fromgetDefaultWorkerGroupForProjectinto a client-facingServiceValidationError(HTTP 422) carryingerror.message. That method runsproject.findFirston the writer; when the writer is unreachable Prisma throws a connection error (P1001) whose message includes the database host, and that raw message was returned to the API client and surfaced in the run view via the SDK'sTriggerApiError.It also mis-classifies a transient outage: a 422 is not retried by the SDK, so triggers failed permanently instead of riding out a brief writer blip.
Design
This is the only place on the trigger path that folds a caught error's message into a client-facing error — every other DB failure on the path propagates to the route's generic 500 handler (scrubbed, and retried by the SDK). So the fix is local:
isInfrastructureError()— true for Prisma connection-level failures (the DB-unreachable family: P1001/P1002/P1008/P1017, plus the init/panic/unknown client error classes), false for query/validation errors (e.g. P2002).Only P1001 ("can't reach database server") has been observed in practice; the rest of the connection family is included as same-class forward-proofing.
Test plan
isInfrastructureErrorclassifies a P1001 (incl. the Prisma 6.xPrismaClientKnownRequestErrorshape) and init errors as infrastructure; P2002 and a plainErroras notgetWorkerQueuerethrows a P1001 unchanged instead of wrapping it in aServiceValidationError; still wraps a domain failure as aServiceValidationError— RED on current code, GREEN after