Skip to content

App Hosting: auto-rollout silently not created for successful build (two silent skips in one day) #10320

@TytaniumDev

Description

@TytaniumDev

Summary

Firebase App Hosting sometimes does not create a rollout resource for a successful build. The build reaches `state: READY`, a Cloud Run revision is created and becomes `ContainerReady`, but no `rollout` resource appears under `/v1/.../backends//rollouts`. Traffic stays on the previous revision indefinitely until a manual `POST .../rollouts` call is made. We saw this happen twice in one day on the same backend — incidence rate ~18% — and the affected builds are field-for-field identical to the builds that auto-rolled out fine.

This looks similar to some older reports (for example #8866) but the symptom here is that the rollout resource itself is never created, not that a rollout is created and fails.

Environment

  • Project: `magic-bracket-simulator`
  • Backend: `api` (Next.js 15.5.7, `output: "standalone"`)
  • Region / location: `us-central1`
  • Billing: Blaze
  • Backend UID: `53b19782-1be2-4edf-9d66-8466a6d089b0`
  • rolloutPolicy: exactly `{ "codebaseBranch": "main" }` — no `disabled`, no `cooldownDuration`, no custom traffic config
  • firebase-tools: whatever `npx firebase-tools@latest` resolved to on 2026-04-10

Observed timeline (both silent skips on 2026-04-10 UTC)

Between 17:50 and 18:46 UTC I merged six PRs to `main` in rapid succession. All six auto-rolled out normally — each `rollout-` resource was created within ~60 ms of its corresponding `build-` resource. I verified this by diffing `createTime` fields across `/v1/.../builds` and `/v1/.../rollouts`.

Then:

First silent skip — build `build-2026-04-10-009` (commit `a09a9e0017d796675f33f53f6540e17a71ed73df`, PR #152)

  • Reached `state: READY` normally
  • No rollout resource was created
  • Ambiguous because build `build-2026-04-10-010` was created ~3 minutes later and may have superseded it

Second silent skip — build `build-2026-04-10-011`, build UID `c6128a89-baae-4b59-96bb-4e8e7414b584` (commit `f6467654a5236cf30f62a39d08afaea7bfcc075d`, PR #154)

  • `createTime: 2026-04-10T21:30:05.888598146Z`
  • Reached `state: READY` successfully
  • Cloud Run revision `api-build-2026-04-10-011` created at 21:34:29Z, became `Ready/Active/ContainerHealthy/ContainerReady` by 21:37:02Z
  • Cloud Run log: `Starting new instance. Reason: DEPLOYMENT_ROLLOUT - Instance started due to traffic shifting between revisions due to deployment, traffic split adjustment, or deployment health check.`
  • `GET /v1/.../backends/api/rollouts?pageSize=1000` (with `nextPageToken` null) returned no rollout referencing this build. Latest rollout was still `rollout-2026-04-10-009` from 18:46:46Z
  • No other build was created after this one, so supersession does not explain it
  • Traffic sat at 100% on `api-build-2026-04-10-010` for ~10+ minutes

Manual workaround that fixed it:

```bash
curl -sS -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{"build":"projects/magic-bracket-simulator/locations/us-central1/backends/api/builds/build-2026-04-10-011"}' \
"https://firebaseapphosting.googleapis.com/v1/projects/magic-bracket-simulator/locations/us-central1/backends/api/rollouts?rolloutId=rollout-manual-cloudtasks"
```

Traffic shifted to `-011` within ~30 seconds of that call. No config changes, no code changes — the runtime path worked fine, only the rollout resource was missing.

What I ruled out

I did a read-only forensics pass before filing:

  • Rollout policy paused / disabled / rate-limited — no such fields exist on the backend, only `codebaseBranch: main`
  • Cooldown after N rapid rollouts — rollouts are created within 60 ms of build creation, not queued, and there was a 3-hour idle gap before `build-011` anyway
  • Build failure — `state: READY`, Cloud Run revision healthy, Cloud Build tags identical to healthy builds (`fah`, `p-fah`, `r-nodejs`, `b-nodejs_20260405_RC00`, `bt-LIFECYCLE`)
  • GitHub connection / repo link drift — both silent-skip builds have `source.codebase` populated correctly with `branch: main`, `hash: `, `uri: https://github.com/.../commit/`, and the right author
  • Quota denied — none in logs, billing is Blaze, Cloud Run revision was created successfully
  • maxInstances / traffic deadlock — traffic shifted in ~30 s once the manual `POST` was made
  • Firebase App Hosting API pagination issue — fetched with `pageSize=1000` and `nextPageToken` was null
  • Hanging long-running operation — `GET /v1/.../operations` showed no queued App Hosting ops for this backend beyond the one I created manually

The one thing I couldn't observe is the App Hosting control plane's internal `CreateRollout` decision — `AppHosting.CreateRollout` audit entries are only present in Cloud Audit Logs for the manual rollout I made, not for any of the six successful auto-rollouts earlier that day. So I can't distinguish "webhook never fired" from "webhook fired but `CreateRollout` returned an error" from "write dropped" without Google-side logs.

Fingerprint (for Google SRE)

Backend UID: `53b19782-1be2-4edf-9d66-8466a6d089b0`
Affected build UIDs:

  • `build-2026-04-10-011` → UID `c6128a89-baae-4b59-96bb-4e8e7414b584` (high-confidence glitch)
  • `build-2026-04-10-009` → commit `a09a9e0017d796675f33f53f6540e17a71ed73df` (plausible glitch, can't rule out supersession)

Healthy auto-rollouts from the same backend on the same day, for reference: `rollout-2026-04-10-004` through `rollout-2026-04-10-009`, all created within ~60 ms of their paired builds.

Expected vs actual

Expected: a rollout resource is created automatically within ~60 ms of a successful build, same as every other deploy that day.

Actual: no rollout resource was created at all. The build sits `READY` and the revision sits healthy, but traffic does not shift. No user-visible error — `firebase-tools` is not involved in this path at all (it's purely the App Hosting control plane reacting to a GitHub push), so there's nowhere for an error to surface to the user except by staring at the rollouts list.

Workaround

Poll the `rollouts` endpoint after every push to `main` and create the rollout manually if it's missing. I implemented this as a GitHub Actions workflow with WIF-based auth and a dedicated `roles/firebaseapphosting.admin` service account. Happy to share the workflow if it's useful.

What would help

  • Any visibility the App Hosting control plane has into why `CreateRollout` did or did not fire for `build-2026-04-10-011` (backend UID above, build UID above)
  • Knowing whether this correlates with a recent App Hosting control-plane deploy
  • A way to surface "build is READY but no rollout exists" as a user-visible warning in the Firebase console, since today there's no indication anything is wrong
  • Confirmation this is a known class of bug so I know whether to invest more in the workaround or remove it

Thanks — happy to provide more data or try reproductions if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions