Conversation
…script Automates E2E failure triage with three new components: - scripts/download-e2e-artifacts.sh: reusable script to download CI artifacts - .claude/skills/e2e-triage/SKILL.md: 7-step triage skill (classify flaky vs real bug, create PRs or issues) - .github/workflows/e2e-triage.yml: workflow_run trigger that auto-runs Claude Opus on E2E failure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 1aa72dcd8a2b
Post "Claude is triaging..." when triage starts and a structured summary with PR/issue links when it completes. The skill now writes triage-summary.json which the workflow parses with jq for the Slack message. Falls back to a warning if no summary is produced. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 8e5dcc6ef8ab
…ifications - Build Slack payload via jq (payload-file-path) instead of interpolating raw text into inline JSON, which broke on quotes/newlines in summaries - Add secrets.E2E_SLACK_WEBHOOK_URL guard to "Build Slack summary" and "Notify Slack - triage complete" steps (matching the "started" step) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7c1914052967
Rewrite SKILL.md with dual-mode support (auto-detected via WORKFLOW_RUN_ID env var): local mode runs tests with mise and re-runs failures up to 3 times, CI mode triggers e2e-isolated.yml workflows for re-run verification. Classification now uses re-run results as the primary signal (all fail = real-bug, mixed results = flaky). Workflow changes: actions permission upgraded to write for gh workflow run, timeout increased to 60m for re-run polling, Claude prompt updated with CI mode hint and re-run instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 9f75c3effd9b
Local mode now presents findings interactively and applies fixes directly in the working tree instead of creating branches/PRs/issues: - Step 4a: findings report, proposed fixes, user approval gate, in-place fixes - Step 4b: unchanged CI behavior (batched PR for flaky, issues for real bugs) - Step 5: local mode gets simpler summary table, no triage-summary.json Entire-Checkpoint: 4e1d9cf59d52
Consistent test failures can be test infrastructure bugs (e2e/ code), not product bugs (cmd/entire/cli/). Update classification signals, fix lists, and action sections to distinguish the two. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 28c90fcc7266
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: ba6877944a6c
Replace duplicated artifact-reading steps in e2e-triage Step 1 with a reference to debug-e2e's Debugging Workflow (steps 2-5), keeping the collect list so classification inputs remain clear. Add Related Skills section to README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: eb14496bde1e
Entire-Checkpoint: bb778fbab533
…d GitHub issues - Remove workflow_run trigger from e2e-triage.yml (now dispatch-only) - Remove issues permission and gh issue commands from CI mode - Replace real-bug GitHub issues with structured CI log reports - Add triage link to Slack failure notification in e2e.yml - Update skill docs and README to reflect new behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 4725d29fd8ff
Remove CI mode (branch creation, PRs, triage-summary.json, CI re-runs via gh workflow run) from the e2e-triage skill while preserving local debugging of CI failures (downloading artifacts, analyzing them, running tests locally). Also removes the e2e-triage.yml workflow and the triage link from the E2E Slack failure notification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c8ddf0fdb9df
Teach Step L1 to accept CI run references (latest, run ID, run URL) and use scripts/download-e2e-artifacts.sh to fetch artifacts, skipping local re-runs and jumping straight to shared analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: fb0c122d807e
Skip re-downloading when the artifact directory already exists and is non-empty, printing a log message instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 5f75d4661043
…ge skill Pre-create ~/.config/cursor/ in Bootstrap() so the cursor CLI doesn't crash with ENOENT when writing cli-config.json after accepting workspace trust on CI. Follows the same pattern used by Claude, Gemini, and Droid agents. Update e2e-triage skill to require running real E2E tests after applying fixes, scoped by change type: agent-specific → that agent's full suite, shared infra → all affected agents, prompt-only → just the affected test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: fa1e4e3fb457
Split the monolithic e2e-triage skill into three focused commands (triage-ci, debug, implement) following the agent-integration plugin pattern. Triage-ci is report-only, implement is action-only, and the /e2e orchestrator runs both sequentially. - Create .claude/plugins/e2e/ with plugin.json and command wrappers - Create .claude/skills/e2e/ with orchestrator SKILL.md and 3 procedures - Delete old .claude/skills/e2e-triage/ and .claude/skills/debug-e2e/ - Update all /debug-e2e references to /e2e:debug in agent-integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 2b582392c6b6
PR SummaryLow Risk Overview Adds Hardens the Written by Cursor Bugbot for commit 3e27584. Configure here. |
There was a problem hiding this comment.
Pull request overview
Adds tooling and documentation to improve debugging/triage of flaky E2E runs (especially Cursor), including a helper script for downloading CI artifacts and new .claude skill/plugin docs for a triage→fix workflow.
Changes:
- Add
scripts/download-e2e-artifacts.shto fetch and normalize GitHub Actions E2E artifacts locally. - Update Cursor E2E agent bootstrap to pre-create the Cursor config directory to avoid runtime ENOENT failures.
- Add/update E2E triage/debug/implement skill + plugin documentation and refresh E2E README guidance.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/download-e2e-artifacts.sh | New helper script to locate a run (latest/ID/URL), download artifacts, flatten wrapper dirs, and write .run-info.json. |
| e2e/agents/cursor_cli.go | Create Cursor config directory during bootstrap to reduce flaky failures. |
| e2e/README.md | Document local artifact download + new triage workflow references. |
| cmd/entire/cli/strategy/common_test.go | Minor formatting/alignment in tests. |
| .claude/skills/e2e/triage-ci.md | New CI triage procedure doc (download artifacts or rerun locally, classify flaky vs real-bug). |
| .claude/skills/e2e/implement.md | New procedure doc for applying fixes and verifying via scoped E2E runs. |
| .claude/skills/e2e/debug.md | Update/normalize debug procedure doc formatting. |
| .claude/skills/e2e/SKILL.md | Add orchestrator skill definition for the E2E triage→implement pipeline. |
| .claude/skills/agent-integration/*.md | Update references to use /e2e:debug instead of the old command name. |
| .claude/plugins/e2e/** | Add local plugin command wrappers + plugin metadata and README. |
| ;; | ||
| http*) | ||
| # Extract run ID from URL: https://github.com/<owner>/<repo>/actions/runs/<id> | ||
| run_id=$(echo "$input" | grep -oE '/runs/[0-9]+' | grep -oE '[0-9]+') |
There was a problem hiding this comment.
With set -euo pipefail, this URL parsing pipeline will cause the script to exit immediately if the URL doesn’t contain /runs/<id> (grep returns non-zero), so the subsequent "Could not extract" check never runs. Capture the pipeline’s failure explicitly (e.g., via if ! run_id=...; then die ...; fi or || true) so invalid URLs produce the intended error message instead of an abrupt exit.
| run_id=$(echo "$input" | grep -oE '/runs/[0-9]+' | grep -oE '[0-9]+') | |
| if ! run_id=$(echo "$input" | grep -oE '/runs/[0-9]+' | grep -oE '[0-9]+'); then | |
| die "Could not extract run ID from URL: $input" | |
| fi |
| else | ||
| mkdir -p "$dest" | ||
| log "Downloading artifacts to $dest/ ..." | ||
| gh run download "$run_id" --dir "$dest" 2>&1 >&2 || die "Failed to download artifacts. They may have expired (retention: 7 days)." |
There was a problem hiding this comment.
The redirection 2>&1 >&2 doesn’t reliably enforce the contract that only the final absolute path is written to stdout; depending on evaluation order it can still leak output to stdout. Redirect the command’s stdout to stderr explicitly (and keep stderr on stderr) so callers can safely capture stdout.
| gh run download "$run_id" --dir "$dest" 2>&1 >&2 || die "Failed to download artifacts. They may have expired (retention: 7 days)." | |
| gh run download "$run_id" --dir "$dest" 1>&2 || die "Failed to download artifacts. They may have expired (retention: 7 days)." |
|
|
||
| # --- Write run metadata --- | ||
|
|
||
| agents_found=$(cd "$dest" && ls -d */ 2>/dev/null | tr -d '/' | tr '\n' ', ' | sed 's/,$//') |
There was a problem hiding this comment.
agents_found is computed via ls -d */ | ... under set -euo pipefail. If the download directory contains no subdirectories (e.g., no artifacts were present or the layout changes), the */ glob/ls will fail and the script will exit before writing .run-info.json. Consider using a glob with nullglob, find, or otherwise handling the empty case so the script fails with a clear message (or records an empty agent list) instead of exiting on ls.
| agents_found=$(cd "$dest" && ls -d */ 2>/dev/null | tr -d '/' | tr '\n' ', ' | sed 's/,$//') | |
| agents_found=$(cd "$dest" && find . -mindepth 1 -maxdepth 1 -type d -printf '%f\n' | tr '\n' ', ' | sed 's/, $//') |
e2e/agents/cursor_cli.go
Outdated
| home, err := os.UserHomeDir() | ||
| if err != nil { | ||
| return fmt.Errorf("get home dir: %w", err) | ||
| } | ||
| dir := filepath.Join(home, ".config", "cursor") |
There was a problem hiding this comment.
This hard-codes the Linux-style config path ~/.config/cursor. To keep local E2E runs working across OSes (and respect XDG_CONFIG_HOME), prefer deriving the base config dir with os.UserConfigDir() and then appending cursor.
| home, err := os.UserHomeDir() | |
| if err != nil { | |
| return fmt.Errorf("get home dir: %w", err) | |
| } | |
| dir := filepath.Join(home, ".config", "cursor") | |
| cfgDir, err := os.UserConfigDir() | |
| if err != nil { | |
| return fmt.Errorf("get user config dir: %w", err) | |
| } | |
| dir := filepath.Join(cfgDir, "cursor") |
e2e/README.md
Outdated
| Use the `debug-e2e` skill (`.claude/skills/debug-e2e/`) for a structured workflow when investigating failures. | ||
|
|
||
| Use the `e2e-triage` skill (`.claude/skills/e2e-triage/`) to automate full triage: download CI artifacts, classify failures as flaky vs real bug, and create PRs or GitHub issues. Run locally with `/e2e-triage` or see the automated CI workflow below. |
There was a problem hiding this comment.
This references an e2e-triage skill at .claude/skills/e2e-triage/ and a /e2e-triage command, but this PR adds the E2E skills under .claude/skills/e2e/ and the plugin commands as /e2e:triage-ci (or the orchestrator /e2e). Update the paths/command names here so the README matches what actually exists in the repo.
| Use the `debug-e2e` skill (`.claude/skills/debug-e2e/`) for a structured workflow when investigating failures. | |
| Use the `e2e-triage` skill (`.claude/skills/e2e-triage/`) to automate full triage: download CI artifacts, classify failures as flaky vs real bug, and create PRs or GitHub issues. Run locally with `/e2e-triage` or see the automated CI workflow below. | |
| Use the E2E debug workflow in the `e2e` skill (`.claude/skills/e2e/`) for a structured workflow when investigating failures. | |
| Use the CI triage workflow in the same `e2e` skill to automate full triage: download CI artifacts, classify failures as flaky vs real bug, and create PRs or GitHub issues. Run locally with the `/e2e:triage-ci` plugin command (or invoke the orchestrator with `/e2e`), or see the automated CI workflow below. |
e2e/README.md
Outdated
|
|
||
| - **`.github/workflows/e2e.yml`** — Runs full suite on push to main. Matrix: `[claude, opencode, gemini]`. | ||
| - **`.github/workflows/e2e-isolated.yml`** — Manual dispatch for debugging a single test. Inputs: agent + test name filter. | ||
| - **`.github/workflows/e2e-triage.yml`** — Auto-triggers on E2E failure via `workflow_run`. Runs Claude Code (Opus) to download artifacts, classify failures, and create PRs (flaky) or issues (real bugs). |
There was a problem hiding this comment.
The README lists .github/workflows/e2e-triage.yml, but there’s no such workflow file in .github/workflows/ in this branch. Either add the workflow in this PR or remove/adjust this bullet so the documentation doesn’t point at a non-existent file.
| - **`.github/workflows/e2e-triage.yml`** — Auto-triggers on E2E failure via `workflow_run`. Runs Claude Code (Opus) to download artifacts, classify failures, and create PRs (flaky) or issues (real bugs). |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
| cp -r "$wrapper"/* "$agent"/ 2>/dev/null || true | ||
| else | ||
| mv "$wrapper" "$agent" | ||
| fi |
There was a problem hiding this comment.
Wrapper directory not removed after copy operation
Low Severity
In the restructuring loop, when the $agent directory already exists, cp -r copies from the e2e-artifacts-* wrapper into it but never removes the wrapper directory. The stale e2e-artifacts-* directory then gets picked up by the ls -d */ on line 92, polluting agents_found in .run-info.json with entries like e2e-artifacts-claude-code alongside claude-code.
Cursor's atomic config write (cli-config.json.tmp → cli-config.json)
races when parallel tests trigger "Workspace Trust Required"
simultaneously. Pre-seeding the file with {} in Bootstrap() avoids
the temp-file rename path entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 47c9b5f45145
The previous pre-seed fix wasn't sufficient because cursor's write-after-trust-acceptance still uses atomic temp-file rename on the shared ~/.config/cursor/cli-config.json. With 43 parallel tests, multiple cursor processes race on the same .tmp file. Fix: give each tmux session its own XDG_CONFIG_HOME + HOME pointing to an isolated temp directory with a pre-seeded cli-config.json, following the same pattern Claude Code uses with CLAUDE_CONFIG_DIR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: df802d707cb9
- Add per-session XDG_CONFIG_HOME with pre-seeded cli-config.json to prevent ENOENT race on parallel tests (keep real HOME for auth) - Wait for "Add a follow-up" instead of PromptPattern() to avoid premature WaitFor settling during Thinking phase - Clean up temp config dir on session close Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 121daaca3e20
When Cursor commits mid-turn before flushing its transcript, condensation fails silently and the checkpoint ID is never recorded. This means the commit trailer points to nonexistent checkpoint metadata. Fix by separating intent from execution: record checkpoint IDs on condensation *attempt* (not just success), and fall back to deferred CondenseSession at stop time when UpdateCommitted returns ErrCheckpointNotFound. No behavior change for agents that flush transcripts before committing (Claude Code, Gemini, etc.). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: fb5a3e2fb741


No description provided.