Skip to content

feat(auth): prompt for recovery on startup#92

Open
ndycode wants to merge 1 commit intogit-plan/02-backup-restore-managerfrom
git-plan/21-startup-recovery-prompt-clean
Open

feat(auth): prompt for recovery on startup#92
ndycode wants to merge 1 commit intogit-plan/02-backup-restore-managerfrom
git-plan/21-startup-recovery-prompt-clean

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 14, 2026

Summary

  • prompt for named-backup recovery before OAuth when interactive codex auth login starts with zero saved accounts and recoverable backups exist
  • keep the startup recovery path available after backing out of the backup browser, cancelling restore confirmation, or hitting restore-time errors, while still suppressing the prompt after same-session fresh and reset actions
  • reuse the pre-scanned backup assessments during startup restore flow so backup discovery avoids a second pass and the recovery helper stays directly covered by active tests

Validation

  • npm run typecheck
  • npm run lint
  • npm run build
  • npx vitest run test/storage.test.ts
  • npx vitest run test/recovery.test.ts -t getActionableNamedBackupRestores
  • cmd /c "set NODE_OPTIONS=--max-old-space-size=16384&& npx vitest run test/codex-manager-cli.test.ts -t startup"
  • npx vitest run test/codex-manager-cli.test.ts -t "returns to the login menu when backup reassessment fails before confirmation"

Summary by cubic

Adds an interactive startup recovery prompt to codex auth login that offers restoring a named backup before OAuth when no accounts are saved and recoverable backups exist. Reduces duplicate backup scanning, improves interactive detection, and updates docs and tests.

  • New Features

    • Show named-backup recovery before OAuth only when the session starts with zero saved accounts and at least one recoverable backup.
    • Keep the recovery path available after backing out, canceling confirmation, or restore-time errors; suppress it after same-session fresh/reset and in non-interactive/fallback flows.
    • Updated docs in Getting Started, Troubleshooting, and Upgrade to explain the new prompt.
  • Refactors

    • Reuse a single backup scan via getActionableNamedBackupRestores and resolveStartupRecoveryAction to avoid a second discovery pass and tighten test coverage.
    • Centralized interactive checks with isInteractiveLoginMenuAvailable() and improved error handling by rethrowing unreadable backup directory errors consistently.

Written for commit 388f263. Summary will update on new commits.

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this PR adds a startup recovery prompt to codex auth login that offers named-backup restore before OAuth when the session starts with zero accounts and at least one actionable backup exists. the feature is well-scoped — prompt suppression after fresh/reset, cached scan reuse, and re-prompt on back-out are all covered by new vitest cases.

two unintended regressions were introduced alongside the feature:

  • isRetryableFilesystemErrorCode now retries EPERM on all platforms (lib/storage.ts). the original code explicitly restricted EPERM retries to process.platform === "win32" because on linux/macOS it is a permanent "operation not permitted" error. the new one-liner drops that guard, meaning real unix permission failures silently burn through 4 retries (~150 ms) before surfacing. this is the opposite of the windows-concurrency safety that this codebase targets.
  • modeTouchesQuotaCache await removed (lib/codex-manager.ts). the deleted block was the sole synchronisation point that prevented an in-flight background pendingMenuQuotaRefresh from racing with quota-cache-touching operations (check, deep-check, forecast, fix). on windows, two concurrent writeFile calls to the same cache file will trigger EBUSY or silent corruption. the test that reproduced this race ("waits for an in-flight menu quota refresh before starting quick check") was also deleted without explanation.

additional minor points:

  • retry jitter (Math.random() * 10) was dropped, making synchronised retry storms more likely on windows with av-locked files
  • getRedactedFilesystemErrorLabel in codex-manager.ts is a byte-for-byte duplicate of getBackupRestoreAssessmentErrorLabel in storage.ts
  • resolveStartupRecoveryAction is exported but has no direct unit tests; edge cases are covered only through the integration suite

Confidence Score: 2/5

  • two real regressions in production paths — not safe to merge without fixes
  • the startup recovery feature itself is solid and well-tested, but two incidental changes introduce genuine bugs: EPERM is now retried on linux/macOS (a permanent error there), and the quota-cache synchronisation guard was silently removed alongside its only regression test. both affect code paths that run on every codex auth login invocation, not just the new recovery flow.
  • lib/storage.ts (EPERM retry regression, jitter removal) and lib/codex-manager.ts (quota-cache await guard removal)

Important Files Changed

Filename Overview
lib/storage.ts introduces getActionableNamedBackupRestores, scanNamedBackups, listNamedBackupsWithoutLoading, and assessNamedBackupRestoreCandidate; contains two regressions: EPERM is now retried on all platforms (should be Windows-only) and retry jitter was silently dropped
lib/codex-manager.ts adds startup recovery prompt flow with suppression flags and cached recovery state reuse; removes modeTouchesQuotaCache await guard without explanation, introducing a quota-cache read/write race for check/deep-check/forecast/fix actions; also duplicates getRedactedFilesystemErrorLabel already present in storage.ts
lib/cli.ts adds isInteractiveLoginMenuAvailable() helper combining !isNonInteractiveMode() && isTTY(); replaces bare isTTY() guard in promptLoginMode — clean, straightforward change
test/codex-manager-cli.test.ts comprehensive startup-recovery test matrix covering actionable backups, declines, back-out, re-prompt, EBUSY on scan/restore/reassessment, suppress-after-fresh/reset, and non-interactive skips; "waits for in-flight menu quota refresh before starting quick check" test was deleted alongside the guard it covered
test/recovery.test.ts reformats to tabs, adds afterEach(vi.restoreAllMocks), adds removeWithRetry Windows-safe cleanup helper, and new getActionableNamedBackupRestores (override) suite — no issues

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([runAuthLogin]) --> B[loadAccounts]
    B --> C{accounts > 0\nor allowEmptyStorageMenu?}
    C -- yes --> D[inner menu loop\npromptLoginMode]
    D --> E{menuResult.mode}
    E -- fresh/reset --> F[run action\nsuppressRecoveryPrompt=true\ncontinue loginFlow]
    F --> B
    E -- cancel --> G[exit to OAuth path]
    E -- other --> D
    C -- no --> G
    G --> H{canPromptForRecovery?\nnot suppressed, not attempted\ncount=0, interactive}
    H -- no --> L[runOAuthFlow]
    H -- yes --> I[getActionableNamedBackupRestores]
    I -- error --> J[warn + continue\nto OAuth]
    J --> L
    I -- 0 actionable\n0 scan error --> K[allowEmptyStorageMenu=true\ncontinue loginFlow]
    K --> B
    I -- actionable backups --> M[confirm: Restore now?]
    M -- no --> L
    M -- yes --> N[runBackupRestoreManager\nwith cached allAssessments]
    N -- restored --> O[continue loginFlow]
    O --> B
    N -- dismissed --> P[pendingRecoveryState=cached\nrecoveryPromptAttempted=false\ncontinue loginFlow]
    P --> B
    L --> Q{OAuth result}
    Q -- success --> R[promptAddAnotherAccount loop]
    Q -- failed/cancel --> S([exit 0])
    R --> S
Loading

Comments Outside Diff (1)

  1. lib/storage.ts, line 683 (link)

    EPERM now retried on all platforms — breaks intentional Windows-only guard

    the old code explicitly restricted EPERM retries to process.platform === "win32" because on linux/macOS EPERM is a hard, permanent "operation not permitted" error (not a transient AV/lock race). the new one-liner silently retries EPERM on all platforms, meaning a real linux permission failure will burn through 4 retries (~150ms of wasted exponential backoff) before surfacing. this is a regression for the vast majority of non-windows deployments and also obscures the original intent.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: lib/storage.ts
    Line: 683
    
    Comment:
    **EPERM now retried on all platforms — breaks intentional Windows-only guard**
    
    the old code explicitly restricted EPERM retries to `process.platform === "win32"` because on linux/macOS `EPERM` is a hard, permanent "operation not permitted" error (not a transient AV/lock race). the new one-liner silently retries EPERM on all platforms, meaning a real linux permission failure will burn through 4 retries (~150ms of wasted exponential backoff) before surfacing. this is a regression for the vast majority of non-windows deployments and also obscures the original intent.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Codex

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: lib/storage.ts
Line: 683

Comment:
**EPERM now retried on all platforms — breaks intentional Windows-only guard**

the old code explicitly restricted EPERM retries to `process.platform === "win32"` because on linux/macOS `EPERM` is a hard, permanent "operation not permitted" error (not a transient AV/lock race). the new one-liner silently retries EPERM on all platforms, meaning a real linux permission failure will burn through 4 retries (~150ms of wasted exponential backoff) before surfacing. this is a regression for the vast majority of non-windows deployments and also obscures the original intent.

```suggestion
	return (code === "EPERM" && process.platform === "win32") || code === "EBUSY" || code === "EAGAIN";
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/storage.ts
Line: 696

Comment:
**Jitter removed from retry backoff**

the jitter (`Math.floor(Math.random() * 10)`) was there to prevent synchronized retry storms when multiple processes hit the same AV-locked file on windows at the same instant. without it, all retrying callers will wake at exactly `10ms`, `20ms`, `40ms`, `80ms` — maximizing lock contention precisely where windows is most fragile. low cost to restore:

```suggestion
			await new Promise((resolve) => setTimeout(resolve, 10 * 2 ** attempt + Math.floor(Math.random() * 10)));
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/codex-manager.ts
Line: 3912

Comment:
**`modeTouchesQuotaCache` await guard removed — quota-cache race introduced**

the deleted block waited for any in-flight background `pendingMenuQuotaRefresh` before the user entered a quota-cache-touching action (`check`, `deep-check`, `forecast`, `fix`). that was the only synchronisation point preventing a concurrent background write to the on-disk quota cache while one of those operations was also reading and rewriting it. the test `"waits for an in-flight menu quota refresh before starting quick check"` was also removed in this PR, so the race has no regression coverage. on windows, two concurrent `writeFile` calls to the same cache file reliably corrupt it or trigger `EBUSY` — the exact scenario `retryTransientFilesystemOperation` is meant to handle.

if the await was removed intentionally (e.g. because the quota-cache operations now use transactions) please add a comment documenting why the race is safe; otherwise this guard should be restored.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/codex-manager.ts
Line: 352-361

Comment:
**Duplicate error-label helper — `getBackupRestoreAssessmentErrorLabel` in `storage.ts` is identical**

`getRedactedFilesystemErrorLabel` here and `getBackupRestoreAssessmentErrorLabel` in `lib/storage.ts` are byte-for-byte identical. exporting the storage one and reusing it in `codex-manager.ts` would keep the redaction logic in one place and make future changes atomic.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 388f263

Greptile also left 1 inline comment on this PR.

Context used:

  • Rule used - What: Every code change must explain how it defend... (source)

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 14, 2026

Warning

Rate limit exceeded

@ndycode has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 11 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 00dd4ab6-e5a7-4c7e-8523-5d33b26812d6

📥 Commits

Reviewing files that changed from the base of the PR and between c6ce2b7 and 388f263.

📒 Files selected for processing (9)
  • docs/getting-started.md
  • docs/troubleshooting.md
  • docs/upgrade.md
  • lib/cli.ts
  • lib/codex-manager.ts
  • lib/storage.ts
  • test/codex-manager-cli.test.ts
  • test/recovery.test.ts
  • test/storage.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch git-plan/21-startup-recovery-prompt-clean
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@@ -3868,17 +3912,6 @@ async function runAuthLogin(): Promise<number> {
console.log("Cancelled.");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modeTouchesQuotaCache await guard removed — quota-cache race introduced

the deleted block waited for any in-flight background pendingMenuQuotaRefresh before the user entered a quota-cache-touching action (check, deep-check, forecast, fix). that was the only synchronisation point preventing a concurrent background write to the on-disk quota cache while one of those operations was also reading and rewriting it. the test "waits for an in-flight menu quota refresh before starting quick check" was also removed in this PR, so the race has no regression coverage. on windows, two concurrent writeFile calls to the same cache file reliably corrupt it or trigger EBUSY — the exact scenario retryTransientFilesystemOperation is meant to handle.

if the await was removed intentionally (e.g. because the quota-cache operations now use transactions) please add a comment documenting why the race is safe; otherwise this guard should be restored.

Prompt To Fix With AI
This is a comment left during a code review.
Path: lib/codex-manager.ts
Line: 3912

Comment:
**`modeTouchesQuotaCache` await guard removed — quota-cache race introduced**

the deleted block waited for any in-flight background `pendingMenuQuotaRefresh` before the user entered a quota-cache-touching action (`check`, `deep-check`, `forecast`, `fix`). that was the only synchronisation point preventing a concurrent background write to the on-disk quota cache while one of those operations was also reading and rewriting it. the test `"waits for an in-flight menu quota refresh before starting quick check"` was also removed in this PR, so the race has no regression coverage. on windows, two concurrent `writeFile` calls to the same cache file reliably corrupt it or trigger `EBUSY` — the exact scenario `retryTransientFilesystemOperation` is meant to handle.

if the await was removed intentionally (e.g. because the quota-cache operations now use transactions) please add a comment documenting why the race is safe; otherwise this guard should be restored.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="lib/codex-manager.ts">

<violation number="1" location="lib/codex-manager.ts:4410">
P3: Do not collapse backup-directory read errors into `[]`; it makes the caller emit a false “No named backups found” result.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

): Promise<BackupRestoreManagerResult> {
const backupDir = getNamedBackupsDirectoryPath();
const assessments =
assessmentsOverride ?? (await loadBackupRestoreManagerAssessments());
Copy link

@cubic-dev-ai cubic-dev-ai bot Mar 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Do not collapse backup-directory read errors into []; it makes the caller emit a false “No named backups found” result.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lib/codex-manager.ts, line 4410:

<comment>Do not collapse backup-directory read errors into `[]`; it makes the caller emit a false “No named backups found” result.</comment>

<file context>
@@ -4266,6 +4398,21 @@ async function runBackupRestoreManager(
+): Promise<BackupRestoreManagerResult> {
+	const backupDir = getNamedBackupsDirectoryPath();
+	const assessments =
+		assessmentsOverride ?? (await loadBackupRestoreManagerAssessments());
+	if (assessments.length === 0) {
+		console.log(`No named backups found. Place backup files in ${backupDir}.`);
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant