fix(cloud-tests): graceful manual-step fallback so auto-remediate never shows raw errors#2915
Merged
Merged
Conversation
…coverage
Adds the building blocks for the manual-steps fallback shipped in the
next two commits, plus broadens the pattern matcher and actionable-
prefix list so more findings exercise the existing auto-repair paths
instead of bailing out:
1. New `AiRemediationService.generateManualSteps(...)`: takes the
finding, the failed plan, and the concrete failure reason, and
returns real customer-facing manual instructions via Sonnet (kept
on the cheap model since this only fires on failure paths and is
plain natural language). Hard fallback to the adapter remediation
text if the AI call itself throws, so the customer never sees a
raw error.
2. `looksLikeValidationError` now matches `MissingParameter`,
"must contain the parameter", "missing parameter",
"parameter is required", "must specify" — covers the EC2-style
error wording that the previous regex missed.
3. `ACTIONABLE_PREFIXES` adds `Authorize`, `Revoke`, `Allow`, `Deny`,
`Disable`, `Detach`, `Add`, `Remove`, `Register`, `Deregister`,
`Tag`, `Untag`. Security-group / IAM-style fix plans now produce
meaningful `willChange` diffs instead of `{}` `{}`.
4. Exports `FindingContext` so it can be reused by the orchestration
service (next commit) when invoking the new fallback path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Customers were seeing raw "Fix could not be applied — <cryptic error>" when the AI's refined plan failed pre-execution validation or AWS rejected a step the executor couldn't auto-repair. The fix swaps every throw inside executeRemediation for a graceful fallback that returns real, AI-generated manual instructions in the existing `canAutoFix:false` response shape — so the frontend renders them with the guided-steps UI it already supports. Concrete changes inside the AWS executeRemediation flow: - Hoist `findingCtx` once at the top of the function so the refineFixPlan call, the per-step repair callback, and the new fallback path all see the same context. - Read-step validation failures → fall back to manual instead of throwing. (Read steps rarely fail; skipping repair here keeps the flow simple.) - "Refined plan has no fix steps" → fall back to manual instead of throwing. There's nothing to repair. - Refined-plan fix-step validation failures → NEW: attempt one AI repair pass on the offending steps (`repairInvalidSteps` parses the step indices from the validator errors and calls `refineStepFromError` per step), then re-validate. If still invalid, fall back to manual. Closes the gap where the executor's own AI step-repair never got a chance because the plan never reached execution. - Executor returned an unrecoverable error → fall back to manual, except for permission errors which still flow through the existing catch block (parseAwsPermissionError already has a polished fixScript payload — don't shadow it). GCP and Azure remediation services have the same throw-on-validation patterns and would benefit from the same treatment; left for a follow-up PR per the original scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The API change in the previous commit returns
`{ guidedOnly: true, guidedSteps, error }` when auto-fix gives up. This
commit threads that response shape through the trigger-task progress
metadata and the Remediation dialog so customers actually see the
manual steps instead of a raw error.
- `classifyExecuteResult` recognizes the new shape and emits a
`{ type: 'manual', reason, guidedSteps }` classification. Defensive
parsing strips non-string entries and ignores `guidedOnly` without
real steps. Permission errors keep their existing precedence.
- `remediateSingle` trigger task carries a new `phase: 'manual'` plus
`guidedSteps` in its progress payload.
- `RemediationDialog` reacts to the new phase by switching its
preview state into the existing guided-only rendering (same UI used
for plans where the AI declared `canAutoFix: false` upfront).
- The two batch-fix paths (single-account + integrations) treat the
manual classification as `failed` with the AI-generated reason — the
batch UI doesn't render per-finding guided steps, but the
user-facing message is now meaningful instead of cryptic. The
per-finding manual steps remain available via the single-fix dialog.
8 new tests on `execute-result.test.ts` (10 total) cover the manual
classification, the precedence rules, and the defensive parsing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
|
🎉 This PR is included in version 3.63.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Customers were seeing raw "Fix could not be applied — " in the Auto-Remediate dialog when the AI's refined plan was rejected by our pre-execution validator or AWS rejected a step the executor couldn't auto-repair. This PR converts every such failure path inside AWS
executeRemediationinto a graceful manual-steps fallback: the API returns real, customer-actionable instructions (AI-generated from the failure context), the trigger task carries them through, and the dialog renders the existing guided-steps UI instead of a red error banner.Net effect: every fix attempt now ends in either "fix worked" or "here's a concrete checklist you can follow in AWS Console / CLI". No raw errors.
How it works (end-to-end)
Changes
apps/api/src/cloud-security/ai-remediation.service.tsgenerateManualSteps(...)— Sonnet-powered. Inputs: finding, failed plan, failure reason. Output:{ guidedSteps: string[], reason: string }. Hard fallback to the adapter'sremediationtext if the AI call itself throws.FindingContextfor the orchestration layer.apps/api/src/cloud-security/aws-command-executor.tslooksLikeValidationErrornow matchesMissingParameter, "must contain the parameter", "missing parameter", "parameter is required", "must specify". The earlier regex missed EC2-style wording and the AI step-repair never fired for those findings.apps/api/src/cloud-security/remediation.service.tsrepairInvalidSteps— parses step indices from validator errors and callsrefineStepFromErrorper offending step before falling back. Closes the gap where the executor's own AI step-repair never got a chance because the plan never reached execution.respondWithManualSteps— generates manual steps, persists the action as failed, returns the response shape the frontend already renders forcanAutoFix: falseplans.executeRemediationswapped for the appropriate fallback. Permission errors still flow through the existing catch (don't shadow the polished fixScript UX).apps/api/src/cloud-security/ai-remediation.service.ts(other change)ACTIONABLE_PREFIXESso security-group / IAM-style plans (Authorize/Revoke/Allow/Deny/Disable/Detach/Add/Remove/Register/Deregister/Tag/Untag) produce meaningfulwillChangediffs instead of{}{}.apps/app/src/trigger/tasks/cloud-security/execute-result.tsmanualclassification + defensive parsing ofguidedSteps(strips non-strings, requiresguidedOnly: trueAND a non-empty list).apps/app/src/trigger/tasks/cloud-security/remediate-single.tsphase: 'manual'in progress +guidedStepsfield.apps/app/src/app/(app)/[orgId]/cloud-tests/components/RemediationDialog.tsxphase: 'manual', switch preview intoguidedOnly: truerendering. Same UI the dialog already uses forcanAutoFix: falseplans.Batch flows
cloud-tests/actions/batch-fix.ts+integrations/[slug]/actions/batch-fix.ts+remediate-batch-helpers.tstreat themanualclassification asfailedwith the AI-generated reason. The per-finding guided steps remain available via the single-fix dialog.Tests
apps/api: 16 tests onai-remediation.service.spec.ts(+4 new forgenerateManualSteps). 267/267 cloud-security tests pass.apps/app: 10 tests onexecute-result.test.ts(+5 new for themanualclassification). All trigger task tests pass.What this PR is NOT
gcp-remediation.service.tslines 200, 205, 208, 239, 288, 315;azure-remediation.service.tslines 136, 149, 252) and would benefit from the same treatment. Left for a follow-up PR per the requested scope ("for now just do only for AWS").MethodNotAllowed,ResourceConflict) will still bypass AI repair but now end up in the manual-steps fallback instead of as raw errors.Manual test plan
🤖 Generated with Claude Code
Summary by cubic
Adds a graceful manual-steps fallback to AWS auto-remediation so users never see cryptic errors. When a plan is invalid or execution fails (except permission errors), the API returns guided steps and the dialog switches to the guided-only UI.
New Features
executeRemediation: on read-step validation failure, empty fix steps, post-repair validation failure, or non-permission execution errors, return{ guidedOnly: true, guidedSteps, error }.generateManualStepsbuilds clear, ordered instructions from the finding, failed plan, and failure reason, with a safe fallback to the adapter’s remediation text.repairInvalidStepsparses validator errors, repairs offending steps withrefineStepFromError, then re-validates before falling back.classifyExecuteResultemitstype: 'manual';remediate-singlepublishesphase: 'manual'withguidedSteps;RemediationDialogrenders guided-only steps; batch-fix marks as failed with the generated reason. Permission errors keep the existing fix-script UX.Bug Fixes
MissingParameter, “missing parameter”, “parameter is required”, “must specify”, etc.) so auto-repair paths trigger reliably.willChangediffs.Written for commit f6c7d94. Summary will update on new commits. Review in cubic