Skip to content

fix(webhook-agent-ingest): stop the internal-key gate from blocking token-authed callbacks#3719

Merged
St0rmz1 merged 1 commit into
mainfrom
fix/webhook-callback-auth-401
Jun 3, 2026
Merged

fix(webhook-agent-ingest): stop the internal-key gate from blocking token-authed callbacks#3719
St0rmz1 merged 1 commit into
mainfrom
fix/webhook-callback-auth-401

Conversation

@St0rmz1
Copy link
Copy Markdown
Contributor

@St0rmz1 St0rmz1 commented Jun 3, 2026

Summary

Webhook triggered Cloud Agent sessions finished their work, but the webhook request stayed inprogress in the UI forever.

Root cause: completion callbacks authenticate with a scoped X-Callback-Token that the callbacks route validates. They are not meant to present the internal API key. But the callbacks route is mounted under /api, and the /api router runs an internal API key middleware on every path beneath it. That middleware returned 401 for the callback before it reached the route that validates the token the callback already carried. Cloud Agent delivery treats a 401 as permanent and does not retry, so the request was never moved out of inprogress.

The fix lets the /api/callbacks subtree skip the internal API key check so its own token validation runs. The route stays authenticated, by the token it already carries rather than the internal key. The callback URL is unchanged, so URLs already stored in running sessions keep resolving, with no transition window.

Architecture note for an unfamiliar reviewer: inbound webhooks already work because they are mounted at /inbound, outside /api. The callbacks route was the only token validated route trapped behind the internal API key gate.

The functional change is a single guard in internalApiMiddleware; the rest of the diff is a comment and tests.

Verification

This is auth and routing logic. The failure it fixes only reproduces with a live webhook trigger and a full session, so manual testing focused on confirming the root cause rather than a deployed run:

  • Confirmed in production logs that the stuck request was caused by a 401 on callback delivery (delivery happened, the endpoint returned 401, and it was classified permanent).
  • Probed the live callback endpoint directly: an unauthenticated request returns the worker's own 401 body, proving the request is rejected by the internal API key middleware before the callbacks handler runs.
  • Post deploy smoke (fire one trigger, confirm the request flips to success): pending deploy.

Automated unit and integration tests are included in the diff and not listed here per the template.

Visual Changes

N/A

Reviewer Notes

  • The only functional change is an early return next() for the /api/callbacks/ prefix at the top of internalApiMiddleware.
  • Security scope: the only route under /api/callbacks is the callback handler, which always validates the scoped token (missing headers return 400, bad token returns 401, session mismatch returns 403). No unauthenticated route is exposed.
  • Path match: under Hono app.route() mounting, c.req.path is the full request path (verified for hono 4.12.18), so the prefix check matches. The trailing slash avoids matching an unrelated path such as /api/callbacksX.
  • Blast radius: the /api router otherwise contains only trigger CRUD routes, which still require the internal API key. The new unit tests assert both that callbacks pass without the key and that trigger routes still reject without it.
  • Why this was not caught before: the prior callbacks test exercised the router in isolation, never mounted under /api, so the middleware interaction was untested. The new tests mount the routers the way the worker does.

…oken-authed callbacks

  Completion callbacks authenticate with a scoped X-Callback-Token, validated by
  the callbacks handler. They were never meant to present the internal API key.
  But because /api/callbacks is mounted under /api, the internalApiMiddleware
  /api/* matcher rejected them with 401 before they reached that handler, leaving
  the webhook request stuck 'inprogress' forever (cloud-agent-next treats 401 as
  a permanent, non-retryable delivery failure).

  Let the /api/callbacks subtree skip the internal-key check so its own
  scoped-token auth runs. The route remains authenticated — just by the token it
  already carries, not the internal key. URL is unchanged, so callback URLs
  already persisted in in-flight cloud-agent-next session DOs keep working (no
  transition window).

  Adds unit tests that mount the routers as index.ts does (the prior isolated
  callbacks test could not catch this) and a real-worker integration regression.
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Jun 3, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Executive Summary

The fix correctly exempts /api/callbacks/* from the internal-key middleware by checking c.req.path before the key validation, and the root cause (Hono's api.use('*', ...) wildcard firing even for paths that don't match any route in the sub-app) is well-documented with regression tests at both unit and real-worker integration levels.

Files Reviewed (3 files)
  • services/webhook-agent-ingest/src/util/auth.ts — core fix: early-return exemption for /api/callbacks/ subtree
  • services/webhook-agent-ingest/src/util/auth.test.ts — new unit tests mounting routers as index.ts does, covering the exemption and the unchanged protected paths
  • services/webhook-agent-ingest/test/integration/routes.test.ts — real-worker regression test confirming callbacks bypass the internal-key gate and reach the handler (evidenced by the handler's own 400 response)

Fix these issues in Kilo Cloud


Reviewed by claude-4.6-sonnet-20260217 · 488,913 tokens

Review guidance: REVIEW.md from base branch main

@St0rmz1 St0rmz1 merged commit 5ad4ea6 into main Jun 3, 2026
16 checks passed
@St0rmz1 St0rmz1 deleted the fix/webhook-callback-auth-401 branch June 3, 2026 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants