Skip to content

Commit bc01f6e

Browse files
d-csclaude
andauthored
fix(webapp): stop writer DB connectivity errors leaking to trigger() API clients (#3874)
## Summary During `trigger()` worker-queue resolution, `getWorkerQueue` wrapped any error from `getDefaultWorkerGroupForProject` into a client-facing `ServiceValidationError` (HTTP 422) carrying `error.message`. That method runs `project.findFirst` on the **writer**; when the writer is unreachable Prisma throws a connection error (P1001) whose message includes the database host, and that raw message was returned to the API client and surfaced in the run view via the SDK's `TriggerApiError`. It also mis-classifies a transient outage: a 422 is not retried by the SDK, so triggers failed permanently instead of riding out a brief writer blip. ## Design This is the only place on the trigger path that folds a *caught* error's message into a client-facing error — every other DB failure on the path propagates to the route's generic 500 handler (scrubbed, and retried by the SDK). So the fix is local: - Add `isInfrastructureError()` — true for Prisma connection-level failures (the DB-unreachable family: P1001/P1002/P1008/P1017, plus the init/panic/unknown client error classes), false for query/validation errors (e.g. P2002). - At the wrap site, rethrow infrastructure errors so they reach the generic 500 handler (no raw message, and retryable). Genuine domain failures (e.g. "Project not found.") still become a 422. Only P1001 ("can't reach database server") has been observed in practice; the rest of the connection family is included as same-class forward-proofing. ## Test plan - [x] Unit: `isInfrastructureError` classifies a P1001 (incl. the Prisma 6.x `PrismaClientKnownRequestError` shape) and init errors as infrastructure; P2002 and a plain `Error` as not - [x] `getWorkerQueue` rethrows a P1001 unchanged instead of wrapping it in a `ServiceValidationError`; still wraps a domain failure as a `ServiceValidationError` — RED on current code, GREEN after - [ ] (optional) toxiproxy e2e: trigger with the writer cut → HTTP 500 generic body, no DB host in the response --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 3bc88c4 commit bc01f6e

4 files changed

Lines changed: 89 additions & 0 deletions

File tree

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
area: webapp
3+
type: fix
4+
---
5+
6+
Stop `trigger()` from leaking raw database connection errors to API clients during a database outage; infrastructure errors now return a generic, retryable 500.

apps/webapp/app/runEngine/concerns/queues.server.ts

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import type { RunEngine } from "~/v3/runEngine.server";
1515
import { env } from "~/env.server";
1616
import { tryCatch } from "@trigger.dev/core/v3";
1717
import { ServiceValidationError } from "~/v3/services/common.server";
18+
import { isInfrastructureError } from "~/utils/prismaErrors";
1819
import { createCache, createLRUMemoryStore, DefaultStatefulContext, Namespace } from "@internal/cache";
1920
import { singleton } from "~/utils/singleton";
2021
import type { TaskMetadataCache, TaskMetadataEntry } from "~/services/taskMetadataCache.server";
@@ -394,6 +395,17 @@ export class DefaultQueueManager implements QueueManager {
394395
);
395396

396397
if (error) {
398+
// getDefaultWorkerGroupForProject queries the writer DB. A Prisma
399+
// infrastructure error (e.g. P1001 "Can't reach database server", whose
400+
// message carries the DB hostname) must NOT be promoted into a
401+
// client-facing ServiceValidationError: that leaks internal infra detail
402+
// to the API client (the SDK echoes it into the run view) and
403+
// mis-classifies a transient outage as a non-retryable 422. Let it
404+
// propagate to the route's generic 500 handler (scrubbed + retryable);
405+
// only wrap genuine domain failures.
406+
if (isInfrastructureError(error)) {
407+
throw error;
408+
}
397409
throw new ServiceValidationError(error.message);
398410
}
399411

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
import { Prisma } from "@trigger.dev/database";
2+
3+
// Prisma connectivity / infrastructure error codes — engine- and
4+
// connection-level failures, not query- or validation-level ones. When the
5+
// database is unreachable, Prisma 6.x throws a PrismaClientKnownRequestError
6+
// carrying one of these codes (e.g. P1001 "Can't reach database server").
7+
const INFRASTRUCTURE_PRISMA_CODES = new Set([
8+
"P1001", // Can't reach database server
9+
"P1002", // Database server reached but timed out
10+
"P1008", // Operations timed out
11+
"P1017", // Server has closed the connection
12+
]);
13+
14+
/**
15+
* True when `error` is a Prisma infrastructure/connectivity failure (DB
16+
* unreachable, timed out, connection dropped) rather than a query- or
17+
* validation-level error.
18+
*
19+
* These errors carry internal infrastructure detail (e.g. the database
20+
* hostname) in their `.message`, so they must never be surfaced to API
21+
* clients — callers should let them propagate to the generic 5xx handler
22+
* (which both scrubs the message and is retryable by the SDK) instead of
23+
* folding `.message` into a client-facing error.
24+
*/
25+
export function isInfrastructureError(error: unknown): boolean {
26+
if (
27+
error instanceof Prisma.PrismaClientInitializationError ||
28+
error instanceof Prisma.PrismaClientRustPanicError ||
29+
error instanceof Prisma.PrismaClientUnknownRequestError
30+
) {
31+
return true;
32+
}
33+
34+
if (error instanceof Prisma.PrismaClientKnownRequestError) {
35+
return INFRASTRUCTURE_PRISMA_CODES.has(error.code);
36+
}
37+
38+
return false;
39+
}
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
import { describe, expect, it } from "vitest";
2+
import { Prisma } from "@trigger.dev/database";
3+
import { isInfrastructureError } from "../app/utils/prismaErrors.js";
4+
5+
describe("isInfrastructureError", () => {
6+
it("treats a P1001 'can't reach database server' (KnownRequestError) as infrastructure", () => {
7+
// Prisma 6.x reports P1001 as a PrismaClientKnownRequestError with code P1001 —
8+
// this is the exact production shape that leaked the RDS hostname to a customer.
9+
const err = new Prisma.PrismaClientKnownRequestError(
10+
"Invalid `prisma.project.findFirst()` invocation: Can't reach database server at host:5432",
11+
{ code: "P1001", clientVersion: "6.14.0" }
12+
);
13+
expect(isInfrastructureError(err)).toBe(true);
14+
});
15+
16+
it("treats a PrismaClientInitializationError as infrastructure", () => {
17+
const err = new Prisma.PrismaClientInitializationError("init failed", "6.14.0");
18+
expect(isInfrastructureError(err)).toBe(true);
19+
});
20+
21+
it("does NOT treat a query/validation error (P2002 unique constraint) as infrastructure", () => {
22+
const err = new Prisma.PrismaClientKnownRequestError("Unique constraint failed", {
23+
code: "P2002",
24+
clientVersion: "6.14.0",
25+
});
26+
expect(isInfrastructureError(err)).toBe(false);
27+
});
28+
29+
it("does NOT treat a plain domain Error as infrastructure", () => {
30+
expect(isInfrastructureError(new Error("Project not found."))).toBe(false);
31+
});
32+
});

0 commit comments

Comments
 (0)