ci: speed up Claude PR review to cut timeouts#1753
Conversation
The Claude review job frequently hit the 30-min timeout. Log analysis of several runs (incl. a 30.5m timeout) showed the dominant cost is the inference proxy throttling the job (429/503/529 → SDK backoff ate ~22 of 30 min), not the PR size — the same 6-file PR ranged 12.5m to a 30.5m timeout. Turn count was not the constraint (a 71-turn run finished in 11.5m; a 25-turn run timed out). Reduce request volume (what trips the rate limit) within the workflow: - Batch independent tool calls into single turns - Make sub-package mode.py/config.py/__init__.py pre-reads conditional on the diff actually touching registration/config/public API - Scoped diffs instead of one giant 'gh pr diff'; hunks-only reads; don't re-read lines the diff already shows - Prioritize modelopt/ > examples/ > tests/; cap large (>50-file) PRs at ~15 files - Restrict cross-file symbol tracing to new/renamed public symbols - Post findings as you go so partial runs still deliver value - Block subagent fan-out (--disallowedTools Task); drop --max-turns (wrong lever) The dominant remaining fix is infra-side (raise proxy quota, re-enable prompt caching) and is out of scope for this workflow change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe Claude Code Review GitHub Actions workflow prompt is updated to add input safety handling for the triggering comment, disallow the ChangesClaude Review Workflow Prompt & Budget Refinement
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Pass the trigger comment body into the prompt so reviewers can scope a run, e.g. '/claude review only modelopt/torch files' or 'focus on the export path'. Previously the comment text was discarded (the prompt was a static template), so any extra instructions were ignored. The comment is injected as untrusted input via the COMMENT_BODY env var and the prompt instructs Claude to treat it only as review-scope guidance, never as instructions to alter the procedure or approval logic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
/claude review |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/claude_review.yml (1)
88-100: ⚡ Quick winPrompt injection risk despite mitigation.
While the instructions explicitly tell Claude to treat the comment as untrusted and scope-only, natural-language constraints are notoriously bypassable through prompt injection techniques. A carefully crafted comment could potentially manipulate Claude to skip review steps, alter approval logic, or focus on misleading areas.
Current mitigations (trusted-user restriction, explicit instructions, tool restrictions) reduce but don't eliminate this risk. Consider additional safeguards:
- Parse the comment body for an explicit allow-list of scope keywords (e.g., "modelopt", "examples", "tests", file patterns) and reject or sanitize other content before substitution.
- Log the full comment body and Claude's interpreted scope in the workflow output for audit.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/claude_review.yml around lines 88 - 100, The workflow currently substitutes the COMMENT_BODY environment variable directly into the reviewer_comment section without validation, creating a prompt injection vulnerability. Add a validation step before the Claude review task that parses the comment body and extracts only whitelisted scope keywords (such as file paths, directory names, or explicit patterns like "modelopt", "tests", "examples"). Reject or sanitize any content that doesn't match the allow-list. Additionally, add logging to the workflow output that captures both the full original comment body and the sanitized scope interpretation that gets passed to Claude, enabling audit trails and detection of attempted injection attacks. This validation should occur before the comment is substituted into the reviewer_comment variable used in the prompt.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/workflows/claude_review.yml:
- Around line 88-100: The workflow currently substitutes the COMMENT_BODY
environment variable directly into the reviewer_comment section without
validation, creating a prompt injection vulnerability. Add a validation step
before the Claude review task that parses the comment body and extracts only
whitelisted scope keywords (such as file paths, directory names, or explicit
patterns like "modelopt", "tests", "examples"). Reject or sanitize any content
that doesn't match the allow-list. Additionally, add logging to the workflow
output that captures both the full original comment body and the sanitized scope
interpretation that gets passed to Claude, enabling audit trails and detection
of attempted injection attacks. This validation should occur before the comment
is substituted into the reviewer_comment variable used in the prompt.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 84c8e1f9-c2f6-49a9-a650-510f98370b79
📒 Files selected for processing (1)
.github/workflows/claude_review.yml
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1753 +/- ##
=======================================
Coverage 58.45% 58.45%
=======================================
Files 510 510
Lines 56274 56274
=======================================
Hits 32896 32896
Misses 23378 23378
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Claude Review Summary
Findings: CRITICAL: 0 · IMPORTANT: 1 · SUGGESTION: 0
This is a workflow-only change that tunes the /claude review prompt to cut latency/timeouts. The intent is sound and most of the change is prompt-text refinement (batching reads, scoped diffs, conditional pre-reads, file-coverage caps, disallowing the Task tool). One change, however, would backfire against the PR's own goal.
Most impactful finding
[IMPORTANT Compatibility] The newly-recommended three-dot git diff BASE...HEAD hard-fails in this workflow's runtime. The job checks out shallow (fetch-depth: 1) and does a single git fetch origin <base_ref>, so there is no common merge base — three-dot diff errors with fatal: origin/main...HEAD: no merge base. I reproduced this in a clone with the same shallow state. Since step 2 makes this the primary diff command, every review would hit the error, spend turns recovering, and fall back — increasing the turn count/latency this PR aims to reduce. The prior gh pr diff (GitHub API, depth-independent) had no such failure mode.
Fix: keep gh pr diff with --name-only/--patch (shallow-safe, path-filterable), or switch to two-dot git diff origin/<base> HEAD -- ... (works in the shallow clone), or deepen the fetch. See the inline comment for details and trade-offs.
Risk assessment
Low overall risk — CI-only, backward compatible, no source/checkpoint/export impact. The one issue is a correctness bug in a recommended command that undermines the PR's latency objective rather than a safety problem; worth fixing before merge so the optimization actually lands.
🤖 Generated with Claude Code
The review job checks out shallow (fetch-depth: 1) and fetches only the base tip, so there is no local merge base. The previously recommended three-dot 'git diff <BASE>...HEAD' would fail with 'no merge base' on every run and waste turns recovering. Switch the scoped diff to two-dot 'git diff <BASE> HEAD -- <paths>', which compares the two tree tips directly and is shallow-safe while still path-scopable. Caught by the Claude review on this PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
/claude review focus on the .github/workflows/claude_review.yml changes — verify the two-dot diff fix (29f8820) resolves the shallow-checkout 'no merge base' failure, and re-check the prompt edits for any other commands that would fail in this workflow's shallow runtime. |
There was a problem hiding this comment.
Claude Review — focused re-check (two-dot diff fix + shallow-runtime safety)
Findings: CRITICAL: 0 · IMPORTANT: 0 · SUGGESTION: 1
Scope applied per the reviewer's request: verify that commit 29f8820 (two-dot git diff) resolves the shallow-checkout 'no merge base' failure, and re-check the prompt edits for any other commands that would fail in this workflow's shallow runtime.
✅ Prior IMPORTANT finding is resolved
I reproduced the workflow's exact runtime state locally (depth-1 checkout of the head SHA plus git fetch origin main, giving git rev-list --count HEAD = 1 with origin/main present as a remote-tracking ref) and confirmed:
- Two-dot 'git diff origin/main HEAD -- .github/' → succeeds, emits the correct diff.
- Three-dot 'git diff origin/main...HEAD' → fails with exactly 'fatal: origin/main...HEAD: no merge base'.
The fix in step 2 (line 192, 'git diff HEAD -- modelopt/ examples/') is the working form. The latency-regression risk flagged in the previous review is eliminated.
✅ No other shallow-unsafe commands are instructed
Every command the prompt actually directs the reviewer to run is depth-independent:
- 'gh pr view … comments,reviews' / '… files …' and 'gh pr review …' are GitHub-API calls (depth-independent).
- The only instructed git command is the verified two-dot diff.
- 'git show' / 'git log' appear in --allowedTools but are not steered by any workflow step, so nothing pushes the reviewer toward a shallow-failing invocation.
SUGGESTION (non-blocking)
Two-dot 'git diff origin/main HEAD' compares tree states, so if main advances on a scoped path after the PR branches, unrelated upstream changes can appear in the scoped diff as apparent reversions. For this CI-only PR there is no divergence and the output is clean, and two-dot is strictly better than the hard-failing three-dot form, so this does not block. If noise is ever observed, 'gh pr diff' (already in --allowedTools) is API-based, depth-independent, and yields the exact PR diff — a fine fallback. Not required for merge.
Risk assessment
Low. CI-only, backward compatible, no source/checkpoint/export impact. The previously-flagged blocking issue is verified fixed in the shallow runtime; no remaining blocking issues.
Claude review passed — no blocking issues found. LGTM
🤖 Generated with Claude Code
|
What does this PR do?
Type of change: Bug fix (CI/workflow)
The
Claude Code Reviewjob frequently hits the 30-min timeout. I analyzed several recent runs to find why and tuned the workflow to reduce it.Root cause (from log analysis): the dominant cost is the inference proxy throttling the job —
429/503/529responses trigger SDK exponential backoff, which ate ~22 of 30 minutes in the timeout run. It is not PR size: the same 6-file PR ranged 12.5m → 23.7m → 30.5m timeout. Turn count is also not the constraint — a 71-turn run finished in 11.5m while a 25-turn run timed out at 28m (slow turns scattered from the start, consistent with throttling, not context growth).The lever within the workflow's control is total request volume (each tool round-trip is a separate uncached, rate-limited request). This PR reduces it:
mode.py/config.py/__init__.pywhen the diff touches registration/config/public API (was unconditional)gh pr diff; hunks-only reads; don't re-read lines the diff already showsmodelopt/>examples/>tests/; cap large (>50-file) PRs at ~15 files--disallowedTools "Task"); drop--max-turns(the data showed it was the wrong lever)Testing
Workflow-only change; validated by the run-log analysis described above. Real effect will be visible on the next
/claude reviewinvocations.Out of scope
The dominant remaining fix is infra-side and not addressable here: raise the proxy rate-limit/quota and re-enable prompt caching (
DISABLE_PROMPT_CACHING). Tracking separately with the proxy team.CONTRIBUTING.md: N/A🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Note: Internal infrastructure change only—no direct impact on end-user features or functionality.