Add Claude AI test failure analysis to Slack notifications#3381
Add Claude AI test failure analysis to Slack notifications#3381robbycochran wants to merge 4 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3381 +/- ##
=======================================
Coverage 27.34% 27.34%
=======================================
Files 95 95
Lines 5420 5420
Branches 2545 2545
=======================================
Hits 1482 1482
Misses 3211 3211
Partials 727 727
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Automatically analyzes integration test failures using Claude AI and includes intelligent insights in Slack notifications to #team-acs-collector-oncall. Key features: - Claude has full source code access via claude-code-base-action - Analyzes JUnit XML reports, failing test source, and git history - Detects platform-specific patterns (arch/OS) - Provides file:line precision and actionable recommendations - Skill-based approach (.claude/commands/) for maintainability - Graceful fallback if analysis fails - Test with PR label without Slack spam Architecture: - Integration tests fail → collect-failures job identifies failures - analyze-and-notify reusable workflow runs Claude skill - Claude creates analysis-report.md with root cause analysis - notify job posts to Slack (skipped for PR label tests) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
bed7306 to
ebe5421
Compare
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
I'm confused, why do we need a completely separate workflow for this? The integration-tests one already has GCP authenticated, it has all the reports downloaded... Can't we just add the couple steps that do the analisis in there and update the notify step? If you want the analisis step to be reusable I'd suggest using an action instead of a full on workflow, but I don't see anything particularly complicate, I would venture to say all you need is to add the Analyze test failures with Claude step into the integration tests.
There was a problem hiding this comment.
Why is this in .github/scripts? Also, is this just a description of what analyze-test-failures.md does? Do we need a 200+ lines of markdown to explain what a separate 100+ line markdown file does?
| continue-on-error: true | ||
| env: | ||
| ANTHROPIC_VERTEX_PROJECT_ID: ${{ secrets.GCP_CLAUDE_PROJECT_ID }} | ||
| CLOUD_ML_REGION: us-east5 |
There was a problem hiding this comment.
Should this be a GHA variable instead? You know... So we can change it without opening a PR if we have to update it.
Summary
Adds claude CI failure analysis that runs before posting failure to slack.