Skip to content

fix(cloud-task): refresh run status before emitting watcher error#2201

Closed
thmsobrmlr wants to merge 1 commit into
mainfrom
posthog-code/fix-stale-cloud-task-status-on-watcher-fail
Closed

fix(cloud-task): refresh run status before emitting watcher error#2201
thmsobrmlr wants to merge 1 commit into
mainfrom
posthog-code/fix-stale-cloud-task-status-on-watcher-fail

Conversation

@thmsobrmlr
Copy link
Copy Markdown
Contributor

@thmsobrmlr thmsobrmlr commented May 18, 2026

TL;DR: when letting posthog code run for a while, there are a lot of pulsating cloud icons. looks like this is because the last status is cached and terminated states always fall back to this

Summary

When a cloud-task watcher fails (e.g. SSE reconnect attempts exhausted, network drop), the renderer's handleCloudTaskUpdate only flips session.status to "error" — it never touches session.cloudStatus. Since taskRunStatus = session?.cloudStatus ?? task.latest_run?.status, the in-memory session keeps overriding the API value, so the pulsing cloud icon in the sidebar stays stuck on "in_progress" for the rest of the app session even after the run actually finished.

This change makes the main-process watcher poll the run endpoint one final time before emitting the error event. If the run has reached a terminal state in the meantime, a kind: "status" event fires first so the renderer can settle the icon on the real outcome (green tick / red cross) instead of forever-pulsing. Auth/access failures (401/403/404) skip the refresh — the same call would just fail again.

  • failWatcher now dispatches to a new refreshStatusThenEmitError
  • fetchTaskRunSilent returns { run, httpStatus } so the final poll can't re-enter watcher teardown; fetchTaskRun delegates to it
  • Updated the existing "repeated stream failures" test for the extra netFetch call
  • Added a test for the terminal-status-before-error ordering
  • Added a test verifying auth failures skip the refresh

Test plan

  • pnpm --filter code typecheck
  • pnpm vitest run src/main/services/cloud-task/service.test.ts — 15/15 pass
  • pnpm vitest run src/renderer/features/sessions/service/service.test.ts — 86/86 pass
  • Manual: trigger a cloud task that completes while the watcher is reconnecting (e.g. kill the SSE connection mid-run) and confirm the sidebar icon settles on the terminal state instead of staying pulsing

When a cloud task watcher fails (e.g. SSE reconnect attempts exhausted),
poll the run endpoint one last time and emit a final status update before
the error event. The run has usually already reached a terminal state by
the time we give up reconnecting, so this lets the sidebar's pulsing cloud
icon settle on the real outcome instead of staying stuck on "in_progress"
for the rest of the app session.

Auth/access failures (401/403/404) skip the refresh — the same call would
just fail again and recurse through shouldFailWatcherForFetchStatus.

Generated-By: PostHog Code
Task-Id: d4fa4ff5-29d4-463a-9df0-b2fe603157a6
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/code/src/main/services/cloud-task/service.ts:1312-1318
`isAuthFailureError` duplicates the exact title strings defined inside `createStreamStatusError`, violating OnceAndOnlyOnce. If any of those three titles are ever updated in `createStreamStatusError`, `isAuthFailureError` silently returns `false` for what is actually an auth failure, causing an unnecessary `fetchTaskRunSilent` call. Since `CloudTaskConnectionError` is an interface you own, the simplest fix is to add an optional `httpStatus?: number` field to it — `createStreamStatusError` already receives the status code and can populate it, and `isAuthFailureError` can then check `[401, 403, 404].includes(error.httpStatus ?? 0)` instead of comparing title strings.

### Issue 2 of 2
apps/code/src/main/services/cloud-task/service.test.ts:740-789
The auth-skip test only exercises the 403 path, but `isAuthFailureError` covers three distinct HTTP statuses (401, 403, 404). Per the team's preference for parameterised tests, all three cases should be verified — a regression in any one of them would go undetected. Consider using `it.each([[401], [403], [404]])` (or Vitest's `test.each`) to run the same assertions against all three codes.

Reviews (1): Last reviewed commit: "fix(cloud-task): refresh run status befo..." | Re-trigger Greptile

Comment on lines +1312 to +1318
function isAuthFailureError(error: CloudTaskConnectionError): boolean {
return (
error.title === "Cloud authentication expired" ||
error.title === "Cloud access denied" ||
error.title === "Cloud run not found"
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 isAuthFailureError duplicates the exact title strings defined inside createStreamStatusError, violating OnceAndOnlyOnce. If any of those three titles are ever updated in createStreamStatusError, isAuthFailureError silently returns false for what is actually an auth failure, causing an unnecessary fetchTaskRunSilent call. Since CloudTaskConnectionError is an interface you own, the simplest fix is to add an optional httpStatus?: number field to it — createStreamStatusError already receives the status code and can populate it, and isAuthFailureError can then check [401, 403, 404].includes(error.httpStatus ?? 0) instead of comparing title strings.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/services/cloud-task/service.ts
Line: 1312-1318

Comment:
`isAuthFailureError` duplicates the exact title strings defined inside `createStreamStatusError`, violating OnceAndOnlyOnce. If any of those three titles are ever updated in `createStreamStatusError`, `isAuthFailureError` silently returns `false` for what is actually an auth failure, causing an unnecessary `fetchTaskRunSilent` call. Since `CloudTaskConnectionError` is an interface you own, the simplest fix is to add an optional `httpStatus?: number` field to it — `createStreamStatusError` already receives the status code and can populate it, and `isAuthFailureError` can then check `[401, 403, 404].includes(error.httpStatus ?? 0)` instead of comparing title strings.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +740 to +789
await vi.advanceTimersByTimeAsync(70_000);
await waitFor(
() =>
updates.some(
(u) =>
typeof u === "object" &&
u !== null &&
(u as { kind?: string }).kind === "error",
),
10_000,
);

const statusIdx = updates.findIndex(
(u) =>
typeof u === "object" &&
u !== null &&
(u as { kind?: string }).kind === "status" &&
(u as { status?: string }).status === "completed",
);
const errorIdx = updates.findIndex(
(u) =>
typeof u === "object" &&
u !== null &&
(u as { kind?: string }).kind === "error",
);

expect(statusIdx).toBeGreaterThanOrEqual(0);
expect(errorIdx).toBeGreaterThanOrEqual(0);
expect(statusIdx).toBeLessThan(errorIdx);
expect(updates[statusIdx]).toMatchObject({
taskId: "task-1",
runId: "run-1",
kind: "status",
status: "completed",
output: { pr_url: "https://example.com/pr/1" },
});
});

it("skips the status refresh when the failure is an auth/access error", async () => {
const updates: unknown[] = [];
service.on(CloudTaskEvent.Update, (payload) => updates.push(payload));

mockNetFetch.mockResolvedValueOnce(
createJsonResponse({ detail: "Forbidden" }, 403),
);

service.watch({
taskId: "task-1",
runId: "run-1",
apiHost: "https://app.example.com",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The auth-skip test only exercises the 403 path, but isAuthFailureError covers three distinct HTTP statuses (401, 403, 404). Per the team's preference for parameterised tests, all three cases should be verified — a regression in any one of them would go undetected. Consider using it.each([[401], [403], [404]]) (or Vitest's test.each) to run the same assertions against all three codes.

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/code/src/main/services/cloud-task/service.test.ts
Line: 740-789

Comment:
The auth-skip test only exercises the 403 path, but `isAuthFailureError` covers three distinct HTTP statuses (401, 403, 404). Per the team's preference for parameterised tests, all three cases should be verified — a regression in any one of them would go undetected. Consider using `it.each([[401], [403], [404]])` (or Vitest's `test.each`) to run the same assertions against all three codes.

**Context Used:** Do not attempt to comment on incorrect alphabetica... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))

How can I resolve this? If you propose a fix, please make it concise.

@thmsobrmlr thmsobrmlr closed this May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant