fix(cloud-tests): graceful manual-step fallback so auto-remediate never shows raw errors by tofikwest · Pull Request #2915 · trycompai/comp

tofikwest · 2026-05-22T15:30:08Z

Summary

Customers were seeing raw "Fix could not be applied — " in the Auto-Remediate dialog when the AI's refined plan was rejected by our pre-execution validator or AWS rejected a step the executor couldn't auto-repair. This PR converts every such failure path inside AWS executeRemediation into a graceful manual-steps fallback: the API returns real, customer-actionable instructions (AI-generated from the failure context), the trigger task carries them through, and the dialog renders the existing guided-steps UI instead of a red error banner.

Net effect: every fix attempt now ends in either "fix worked" or "here's a concrete checklist you can follow in AWS Console / CLI". No raw errors.

How it works (end-to-end)

┌─ executeRemediation (AWS) ─────────────────────────────────────────┐
│  read-step validation fails       →  manual-steps fallback         │
│  refined plan has no fix steps    →  manual-steps fallback         │
│  refined plan fails validation    →  try AI step-repair → revalidate
│                                   →  still invalid? manual fallback│
│  executor returns error           →  permission error? existing UX │
│                                   →  otherwise: manual fallback    │
└────────────────────────────────────────────────────────────────────┘

API returns { status: 'failed', guidedOnly: true, guidedSteps, error }
        ↓
classifyExecuteResult → { type: 'manual', reason, guidedSteps }
        ↓
remediate-single trigger task → progress.phase = 'manual' + guidedSteps
        ↓
RemediationDialog → switches preview into guidedOnly mode
        ↓
Customer sees: ordered numbered steps, NOT a raw error

Changes

`apps/api/src/cloud-security/ai-remediation.service.ts`

New generateManualSteps(...) — Sonnet-powered. Inputs: finding, failed plan, failure reason. Output: { guidedSteps: string[], reason: string }. Hard fallback to the adapter's remediation text if the AI call itself throws.
Exports FindingContext for the orchestration layer.

`apps/api/src/cloud-security/aws-command-executor.ts`

looksLikeValidationError now matches MissingParameter, "must contain the parameter", "missing parameter", "parameter is required", "must specify". The earlier regex missed EC2-style wording and the AI step-repair never fired for those findings.

`apps/api/src/cloud-security/remediation.service.ts`

repairInvalidSteps — parses step indices from validator errors and calls refineStepFromError per offending step before falling back. Closes the gap where the executor's own AI step-repair never got a chance because the plan never reached execution.
respondWithManualSteps — generates manual steps, persists the action as failed, returns the response shape the frontend already renders for canAutoFix: false plans.
Every throw in executeRemediation swapped for the appropriate fallback. Permission errors still flow through the existing catch (don't shadow the polished fixScript UX).

`apps/api/src/cloud-security/ai-remediation.service.ts` (other change)

Broader ACTIONABLE_PREFIXES so security-group / IAM-style plans (Authorize/Revoke/Allow/Deny/Disable/Detach/Add/Remove/Register/Deregister/Tag/Untag) produce meaningful willChange diffs instead of {} {}.

`apps/app/src/trigger/tasks/cloud-security/execute-result.ts`

New manual classification + defensive parsing of guidedSteps (strips non-strings, requires guidedOnly: true AND a non-empty list).
Permission-error classification still wins when both fields are present.

`apps/app/src/trigger/tasks/cloud-security/remediate-single.ts`

New phase: 'manual' in progress + guidedSteps field.

`apps/app/src/app/(app)/[orgId]/cloud-tests/components/RemediationDialog.tsx`

On phase: 'manual', switch preview into guidedOnly: true rendering. Same UI the dialog already uses for canAutoFix: false plans.

Batch flows

cloud-tests/actions/batch-fix.ts + integrations/[slug]/actions/batch-fix.ts + remediate-batch-helpers.ts treat the manual classification as failed with the AI-generated reason. The per-finding guided steps remain available via the single-fix dialog.

Tests

apps/api: 16 tests on ai-remediation.service.spec.ts (+4 new for generateManualSteps). 267/267 cloud-security tests pass.
apps/app: 10 tests on execute-result.test.ts (+5 new for the manual classification). All trigger task tests pass.

What this PR is NOT

NOT a per-finding audit. We have ~100+ finding types across AWS adapters; verifying each individually requires real-tenant testing and is weeks of work. This PR makes the safety net strong enough that the customer never sees a raw error regardless of which finding it is.
NOT a GCP/Azure parity change. GCP and Azure remediation services have the same throw-on-validation patterns (gcp-remediation.service.ts lines 200, 205, 208, 239, 288, 315; azure-remediation.service.ts lines 136, 149, 252) and would benefit from the same treatment. Left for a follow-up PR per the requested scope ("for now just do only for AWS").
NOT a fix for every cryptic auto-remediate error. The pattern broadening covers the common AWS error wording we've seen in customer reports; the universal AI step-repair is gated to validation-class errors. Errors AWS classifies as non-validation (e.g., MethodNotAllowed, ResourceConflict) will still bypass AI repair but now end up in the manual-steps fallback instead of as raw errors.

Manual test plan

Trigger an auto-fix on a finding known to hit the empty-required-param bug (CloudTrail "No trails configured" was the customer-reported case). Confirm the dialog shows manual steps instead of a red error.
Trigger an auto-fix on a finding that succeeds today. Confirm the happy path still completes and the success animation still renders.
Trigger an auto-fix that fails with a permission error. Confirm the permission-error UX (fixScript card) still renders — manual fallback should NOT shadow it.
Trigger a batch fix that includes findings that fall back to manual. Confirm the batch UI shows them as failed with the AI-generated reason.

🤖 Generated with Claude Code

Summary by cubic

Adds a graceful manual-steps fallback to AWS auto-remediation so users never see cryptic errors. When a plan is invalid or execution fails (except permission errors), the API returns guided steps and the dialog switches to the guided-only UI.

New Features
- Manual-steps fallback in executeRemediation: on read-step validation failure, empty fix steps, post-repair validation failure, or non-permission execution errors, return { guidedOnly: true, guidedSteps, error }.
- generateManualSteps builds clear, ordered instructions from the finding, failed plan, and failure reason, with a safe fallback to the adapter’s remediation text.
- Pre-execution repair: repairInvalidSteps parses validator errors, repairs offending steps with refineStepFromError, then re-validates before falling back.
- End-to-end surfacing: classifyExecuteResult emits type: 'manual'; remediate-single publishes phase: 'manual' with guidedSteps; RemediationDialog renders guided-only steps; batch-fix marks as failed with the generated reason. Permission errors keep the existing fix-script UX.
Bug Fixes
- Broader AWS validation-error detection (MissingParameter, “missing parameter”, “parameter is required”, “must specify”, etc.) so auto-repair paths trigger reliably.
- Expanded actionable prefixes (Authorize, Revoke, Allow, Deny, Disable, Detach, Add, Remove, Register, Deregister, Tag, Untag) for more informative willChange diffs.

^{Written for commit f6c7d94. Summary will update on new commits. Review in cubic}

…coverage Adds the building blocks for the manual-steps fallback shipped in the next two commits, plus broadens the pattern matcher and actionable- prefix list so more findings exercise the existing auto-repair paths instead of bailing out: 1. New `AiRemediationService.generateManualSteps(...)`: takes the finding, the failed plan, and the concrete failure reason, and returns real customer-facing manual instructions via Sonnet (kept on the cheap model since this only fires on failure paths and is plain natural language). Hard fallback to the adapter remediation text if the AI call itself throws, so the customer never sees a raw error. 2. `looksLikeValidationError` now matches `MissingParameter`, "must contain the parameter", "missing parameter", "parameter is required", "must specify" — covers the EC2-style error wording that the previous regex missed. 3. `ACTIONABLE_PREFIXES` adds `Authorize`, `Revoke`, `Allow`, `Deny`, `Disable`, `Detach`, `Add`, `Remove`, `Register`, `Deregister`, `Tag`, `Untag`. Security-group / IAM-style fix plans now produce meaningful `willChange` diffs instead of `{}` `{}`. 4. Exports `FindingContext` so it can be reused by the orchestration service (next commit) when invoking the new fallback path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Customers were seeing raw "Fix could not be applied — <cryptic error>" when the AI's refined plan failed pre-execution validation or AWS rejected a step the executor couldn't auto-repair. The fix swaps every throw inside executeRemediation for a graceful fallback that returns real, AI-generated manual instructions in the existing `canAutoFix:false` response shape — so the frontend renders them with the guided-steps UI it already supports. Concrete changes inside the AWS executeRemediation flow: - Hoist `findingCtx` once at the top of the function so the refineFixPlan call, the per-step repair callback, and the new fallback path all see the same context. - Read-step validation failures → fall back to manual instead of throwing. (Read steps rarely fail; skipping repair here keeps the flow simple.) - "Refined plan has no fix steps" → fall back to manual instead of throwing. There's nothing to repair. - Refined-plan fix-step validation failures → NEW: attempt one AI repair pass on the offending steps (`repairInvalidSteps` parses the step indices from the validator errors and calls `refineStepFromError` per step), then re-validate. If still invalid, fall back to manual. Closes the gap where the executor's own AI step-repair never got a chance because the plan never reached execution. - Executor returned an unrecoverable error → fall back to manual, except for permission errors which still flow through the existing catch block (parseAwsPermissionError already has a polished fixScript payload — don't shadow it). GCP and Azure remediation services have the same throw-on-validation patterns and would benefit from the same treatment; left for a follow-up PR per the original scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The API change in the previous commit returns `{ guidedOnly: true, guidedSteps, error }` when auto-fix gives up. This commit threads that response shape through the trigger-task progress metadata and the Remediation dialog so customers actually see the manual steps instead of a raw error. - `classifyExecuteResult` recognizes the new shape and emits a `{ type: 'manual', reason, guidedSteps }` classification. Defensive parsing strips non-string entries and ignores `guidedOnly` without real steps. Permission errors keep their existing precedence. - `remediateSingle` trigger task carries a new `phase: 'manual'` plus `guidedSteps` in its progress payload. - `RemediationDialog` reacts to the new phase by switching its preview state into the existing guided-only rendering (same UI used for plans where the AI declared `canAutoFix: false` upfront). - The two batch-fix paths (single-account + integrations) treat the manual classification as `failed` with the AI-generated reason — the batch UI doesn't render per-finding guided steps, but the user-facing message is now meaningful instead of cryptic. The per-finding manual steps remain available via the single-fix dialog. 8 new tests on `execute-result.test.ts` (10 total) cover the manual classification, the precedence rules, and the defensive parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-22T15:30:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
app	Ready	Preview, Comment	May 22, 2026 3:38pm
comp-framework-editor	Ready	Preview, Comment	May 22, 2026 3:38pm

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
portal	Skipped		May 22, 2026 3:38pm

cubic-dev-ai

No issues found across 11 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

_{Re-trigger cubic}

claudfuen · 2026-05-22T15:56:00Z

🎉 This PR is included in version 3.63.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

tofikwest and others added 3 commits May 22, 2026 11:28

vercel Bot deployed to Preview – comp-framework-editor May 22, 2026 15:30 View deployment

vercel Bot deployed to Preview – app May 22, 2026 15:32 View deployment

Merge branch 'main' into tofik/auto-remediate-manual-fallback

f6c7d94

vercel Bot temporarily deployed to Preview – portal May 22, 2026 15:35 Inactive

cubic-dev-ai Bot reviewed May 22, 2026

View reviewed changes

vercel Bot deployed to Preview – comp-framework-editor May 22, 2026 15:36 View deployment

vercel Bot deployed to Preview – app May 22, 2026 15:38 View deployment

tofikwest merged commit 35af953 into main May 22, 2026
11 checks passed

tofikwest deleted the tofik/auto-remediate-manual-fallback branch May 22, 2026 15:39

claudfuen added the released label May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cloud-tests): graceful manual-step fallback so auto-remediate never shows raw errors#2915

fix(cloud-tests): graceful manual-step fallback so auto-remediate never shows raw errors#2915
tofikwest merged 4 commits into
mainfrom
tofik/auto-remediate-manual-fallback

tofikwest commented May 22, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

vercel Bot commented May 22, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

claudfuen commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tofikwest commented May 22, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works (end-to-end)

Changes

apps/api/src/cloud-security/ai-remediation.service.ts

apps/api/src/cloud-security/aws-command-executor.ts

apps/api/src/cloud-security/remediation.service.ts

apps/api/src/cloud-security/ai-remediation.service.ts (other change)

apps/app/src/trigger/tasks/cloud-security/execute-result.ts

apps/app/src/trigger/tasks/cloud-security/remediate-single.ts

apps/app/src/app/(app)/[orgId]/cloud-tests/components/RemediationDialog.tsx

Batch flows

Tests

What this PR is NOT

Manual test plan

Summary by cubic

Uh oh!

vercel Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claudfuen commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tofikwest commented May 22, 2026 •

edited by cubic-dev-ai Bot

Loading

`apps/api/src/cloud-security/ai-remediation.service.ts`

`apps/api/src/cloud-security/aws-command-executor.ts`

`apps/api/src/cloud-security/remediation.service.ts`

`apps/api/src/cloud-security/ai-remediation.service.ts` (other change)

`apps/app/src/trigger/tasks/cloud-security/execute-result.ts`

`apps/app/src/trigger/tasks/cloud-security/remediate-single.ts`

`apps/app/src/app/(app)/[orgId]/cloud-tests/components/RemediationDialog.tsx`

vercel Bot commented May 22, 2026 •

edited

Loading