Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions .github/workflows/post-rca-comment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Post Root Cause Analysis Comment

on:
workflow_dispatch:
inputs:
issue_number:
description: 'Issue number to comment on'
required: true
default: '12'

permissions:
issues: write
contents: read

jobs:
post-comment:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Post Root Cause Analysis
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const issueNumber = ${{ github.event.inputs.issue_number }};

// Read the root cause analysis file
const commentBody = fs.readFileSync('Act-3/ROOT_CAUSE_ANALYSIS.md', 'utf8');

// Post the comment
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: issueNumber,
body: commentBody
});

console.log(`Posted root cause analysis to issue #${issueNumber}`);
42 changes: 42 additions & 0 deletions Act-3/HOW_TO_POST_RCA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# How to Post the Root Cause Analysis to GitHub Issue

The root cause analysis for the ArgoCD deployment failure has been completed and documented in `ROOT_CAUSE_ANALYSIS.md`.

## Automated Options

### Option 1: Using GitHub CLI
```bash
cd Act-3
gh issue comment 12 --body-file ROOT_CAUSE_ANALYSIS.md
```

### Option 2: Using the Bash Script
```bash
cd Act-3
export GITHUB_TOKEN="your_github_token_here"
./post-rca-to-issue.sh 12
```

### Option 3: Using GitHub Actions Workflow
1. Go to the Actions tab in the repository
2. Select "Post Root Cause Analysis Comment" workflow
3. Click "Run workflow"
4. Enter issue number: `12`
5. Click "Run workflow"

## Manual Option

If automated options are not available:

1. Open the GitHub issue: https://github.com/DevExpGbb/agentic-platform-engineering/issues/12
2. Copy the content from `ROOT_CAUSE_ANALYSIS.md`
3. Paste it as a new comment on the issue
4. Click "Comment"

## Summary of Findings

**Root Cause:** Invalid Kubernetes manifest with malformed `apiVersion` field
**Location:** `apps/broken-aks-store-all-in-one.yaml` line 178 in source repository
**Issue:** `apiVersion: apps/v` should be `apiVersion: apps/v1`

See `ROOT_CAUSE_ANALYSIS.md` for complete details and remediation recommendations.
87 changes: 87 additions & 0 deletions Act-3/INVESTIGATION_SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Investigation Summary: ArgoCD Deployment Failure

**Date:** 2026-02-03
**Issue:** #12 - 🚨 ArgoCD Deployment Failed: 2-broken-apps
**Application:** 2-broken-apps
**Status:** ✅ Root Cause Identified

---

## Executive Summary

The ArgoCD deployment failure for the `2-broken-apps` application has been thoroughly investigated. The root cause has been identified as an **intentionally broken Kubernetes manifest** in the source repository used for testing the ArgoCD notification system.

## Root Cause

**Problem:** Invalid `apiVersion` field in Deployment manifest
**Location:** `https://github.com/dcasati/argocd-notification-examples.git`
- File: `apps/broken-aks-store-all-in-one.yaml`
- Line: 178
- Current value: `apiVersion: apps/v` (incomplete)
- Expected value: `apiVersion: apps/v1` (complete)

**Affected Resource:** `order-service` Deployment

## Why This Matters

- Kubernetes cannot parse manifests with invalid `apiVersion` values
- ArgoCD validation fails before attempting to apply the resource
- Results in "synchronization tasks are not valid" error
- Application remains in "Degraded" health and "OutOfSync" status

## Context

Based on analysis of the repository and commit history:

1. **Repository Name:** `argocd-notification-examples` - suggests this is a testing repository
2. **Commit Message:** "break apiVersion formatting in deployment YAML" - explicitly indicates intentional breakage
3. **Purpose:** This appears to be a test case to validate the ArgoCD notification system

**Result:** ✅ The notification system is working correctly. The automated workflow successfully detected the failure and created GitHub issue #12.

## Documentation Provided

All findings have been documented in:

1. **`Act-3/ROOT_CAUSE_ANALYSIS.md`** - Complete technical analysis with:
- Detailed problem description
- 4 remediation options
- Verification steps
- Investigation methodology

2. **`Act-3/HOW_TO_POST_RCA.md`** - Instructions for posting the analysis to GitHub issue

3. **`Act-3/post-rca-to-issue.sh`** - Bash script for automated posting (requires GitHub token)

4. **`.github/workflows/post-rca-comment.yml`** - GitHub Actions workflow for posting via UI

## Next Steps

### If This Is a Test (Most Likely)
- ✅ Mark test as successful - notification system is working
- Consider closing the application: `argocd app delete 2-broken-apps`
- Update documentation about the test case

### If This Needs to Be Fixed
- Follow Option 1 in `ROOT_CAUSE_ANALYSIS.md` to fix the source repository
- Change line 178: `apiVersion: apps/v` → `apiVersion: apps/v1`
- Commit, push, and trigger ArgoCD sync

## Verification

The notification workflow successfully:
1. Detected the ArgoCD sync failure
2. Extracted failure details including error message and revision
3. Triggered GitHub repository_dispatch event
4. Created GitHub issue #12 with comprehensive failure information
5. Applied appropriate labels: `argocd-deployment-failure`, `automated`, `bug`

## Conclusion

**Root Cause:** Intentionally malformed `apiVersion` field in test repository
**System Status:** ArgoCD notification system is functioning correctly
**Recommendation:** If testing is complete, delete the test application. Otherwise, fix the source repository manifest.

---

For complete technical details and remediation options, see `Act-3/ROOT_CAUSE_ANALYSIS.md`.
119 changes: 119 additions & 0 deletions Act-3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Act 3: ArgoCD Deployment Failure Investigation

This directory contains the investigation results for the ArgoCD deployment failure of the `2-broken-apps` application.

## Quick Links

- 📊 **[Investigation Summary](INVESTIGATION_SUMMARY.md)** - Executive summary of findings
- 🔍 **[Root Cause Analysis](ROOT_CAUSE_ANALYSIS.md)** - Detailed technical analysis with remediation options
- 📝 **[How to Post RCA](HOW_TO_POST_RCA.md)** - Instructions for posting the analysis to GitHub issue #12

## Investigation Results

### Root Cause
Invalid Kubernetes manifest with incomplete `apiVersion` field in the source repository.

- **Location:** `apps/broken-aks-store-all-in-one.yaml` (line 178)
- **Issue:** `apiVersion: apps/v` (should be `apiVersion: apps/v1`)
- **Repository:** https://github.com/dcasati/argocd-notification-examples.git
- **Revision:** `8cd04df204028ff78613a69fdb630625864037c6`

### Conclusion
This appears to be an **intentional test case** to validate the ArgoCD notification system:
- ✅ The notification system detected the failure
- ✅ GitHub issue #12 was automatically created
- ✅ All error details were properly captured and reported

## Files in This Directory

| File | Description |
|------|-------------|
| `INVESTIGATION_SUMMARY.md` | Executive summary of the investigation |
| `ROOT_CAUSE_ANALYSIS.md` | Complete technical analysis with 4 remediation options |
| `HOW_TO_POST_RCA.md` | Instructions for posting analysis to GitHub issue |
| `post-rca-to-issue.sh` | Bash script for automated posting (requires GitHub token) |
| `argocd-test-app.yaml` | ArgoCD Application manifest (the one causing the issue) |

## Related Files

| File | Description |
|------|-------------|
| `../.github/workflows/post-rca-comment.yml` | GitHub Actions workflow for posting RCA to issue |
| `../.github/workflows/argocd-deployment-failure.yml` | Workflow that creates issues on ArgoCD failures |
| `../.github/argocd/argocd-notifications-config.yaml` | ArgoCD notification configuration |

## Remediation Options

The `ROOT_CAUSE_ANALYSIS.md` provides four options:

1. **Fix the source repository** (recommended if not a test)
2. **Use a different revision** (rollback to working commit)
3. **Use a different source repository** (point to valid repo)
4. **Delete the application** (if testing is complete)

## How to Use These Files

### To Post the Analysis to GitHub Issue #12

Choose one of these methods:

```bash
# Option 1: Using GitHub CLI
gh issue comment 12 --body-file ROOT_CAUSE_ANALYSIS.md

# Option 2: Using the bash script (requires GITHUB_TOKEN)
export GITHUB_TOKEN="your_token_here"
./post-rca-to-issue.sh 12

# Option 3: Manual copy/paste
# Open ROOT_CAUSE_ANALYSIS.md and copy content to GitHub issue #12
```

### To Fix the Issue

If this is not a test and needs to be fixed:

```bash
# Clone the source repository
git clone https://github.com/dcasati/argocd-notification-examples.git
cd argocd-notification-examples

# Fix the apiVersion
sed -i 's/apiVersion: apps\/v$/apiVersion: apps\/v1/' apps/broken-aks-store-all-in-one.yaml

# Commit and push
git add apps/broken-aks-store-all-in-one.yaml
git commit -m "Fix: Complete apiVersion for order-service Deployment"
git push origin main

# Trigger ArgoCD sync
argocd app sync 2-broken-apps
```

## Background: ArgoCD Notifications

This investigation demonstrates the ArgoCD notification system working correctly:

```
ArgoCD detects failure
ArgoCD Notifications sends webhook
GitHub repository_dispatch
GitHub Actions creates issue
GitHub Copilot investigates
Root cause identified and documented
```

## Related Issues

- GitHub Issue #12: 🚨 ArgoCD Deployment Failed: 2-broken-apps
- GitHub Issue #11: 🚨 ArgoCD Deployment Failed: 2-broken-apps (duplicate)

---

**Investigation completed:** 2026-02-03
**Investigated by:** GitHub Copilot Agent
Loading