fix(router): map gRPC errors to proper HTTP codes (#31)#68
Open
Mark Chmarny (mchmarny) wants to merge 1 commit into
Open
fix(router): map gRPC errors to proper HTTP codes (#31)#68Mark Chmarny (mchmarny) wants to merge 1 commit into
Mark Chmarny (mchmarny) wants to merge 1 commit into
Conversation
The Envoy ext_proc handler previously wrapped resume errors with %w,
leaking the gRPC encoding ("rpc error: code = ... desc = ...") into the
response body, and routed every non-NotFound failure through 500. It
also returned a generic "not found" body that did not identify which
resource was missing.
This change introduces a small mapping layer that translates gRPC status
codes into appropriate HTTP status codes with short, client-safe bodies:
NotFound -> 404
FailedPrecondition -> 503 (desc preserved, e.g. "no free workers
available", which is actionable and not
security-sensitive)
Unavailable -> 503
DeadlineExceeded -> 504
PermissionDenied -> 403
Unauthenticated -> 401
ResourceExhausted -> 429
other -> 500 (no detail leak)
The original error is preserved via Unwrap so existing slog.ErrorContext
logging keeps full fidelity.
Layering: ActorResumer no longer translates codes.NotFound into a
router-layer sentinel; gRPC errors propagate untouched so the HTTP
boundary owns the mapping.
018f791 to
0b8af6e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #31.
The Envoy
ext_prochandler wrapped resume errors with%w, so the gRPC encoding leaked into the response body, and every non-NotFound failure returned 500. The 404 path returned"not found"without indicating what was missing.This change adds a small mapping layer that translates gRPC status codes to HTTP status codes with short, client-safe bodies:
NotFoundactor "<id>" not foundFailedPreconditionactor "<id>" unavailable: <desc>*Unavailableactor "<id>" unavailableDeadlineExceededactor "<id>" request timed outPermissionDeniedactor "<id>" access deniedUnauthenticatedactor "<id>" authentication requiredResourceExhaustedactor "<id>" rate limitederror resuming actor "<id>"*
FailedPreconditionis the only code whose gRPCdescis preserved verbatim; it carries actionable client-facing context like"no free workers available"and is not security-sensitive. Other codes deliberately drop thedescto avoid leaking server-side detail.The original error is preserved via
Unwrapso existingslog.ErrorContextlogging keeps full fidelity.Two smaller follow-ons made in the same change:
ActorResumerno longer translatescodes.NotFoundinto a router-layer sentinel. gRPC errors now propagate untouched so the HTTP boundary owns the mapping. The retry behavior is unchanged (the backoff already exits on any non-Abortederror).actor "<id>" routing failedinstead of the previous body that echoed back the invalid IP. The full IP is still logged server-side. This is deliberate; the IP is internal infrastructure detail and shouldn't appear in a client response.Before / after
Before:
After:
HTTP/1.1 503 Service Unavailable actor "ctr6" unavailable: no free workers availableTest plan
go test -race ./cmd/atenet/internal/app/router/...— all passgo vet ./...— cleanmake verify-fmt— cleanhack/verify-all.sh(boilerplate, go-generate, gofmt, licenses, mod-tidy) — all passmake build-atenet— succeedsmapResumeError, plus constructor and contract tests inerrors_test.go; existingextproc_test.gocases expanded to cover the new gRPC code mappings end-to-end throughhandleRequestHeaders.cc Tim Hockin (@thockin), is this what you had in mind in #31? PTAL