|
| 1 | +--- |
| 2 | +name: openstack-ci-analysis |
| 3 | +description: Analyzes OpenStack CI job health, pass rates, coverage gaps, and failure categories. Use when asked to analyze CI jobs, generate CI health reports, compare platform performance, or investigate job failures for OpenStack/ShiftStack. |
| 4 | +--- |
| 5 | + |
| 6 | +# OpenStack CI Analysis |
| 7 | + |
| 8 | +Comprehensive analysis of OpenStack CI job health using Sippy API metrics and CI configuration data. |
| 9 | + |
| 10 | +## Prerequisites |
| 11 | + |
| 12 | +- Python 3.6+ with pyyaml: `pip install pyyaml` |
| 13 | +- Access to openshift/release repository (for `ci-operator/config`) |
| 14 | + |
| 15 | +## Quick Start |
| 16 | + |
| 17 | +Run all analysis with the wrapper script: |
| 18 | + |
| 19 | +```bash |
| 20 | +python3 scripts/run_analysis.sh \ |
| 21 | + --config-dir /path/to/release/ci-operator/config \ |
| 22 | + --output-dir /tmp/analysis |
| 23 | +``` |
| 24 | + |
| 25 | +Add `--force` to refresh cached Sippy data. |
| 26 | + |
| 27 | +## Workflow |
| 28 | + |
| 29 | +### Phase 1: Data Collection |
| 30 | + |
| 31 | +Run in order (each depends on prior outputs): |
| 32 | + |
| 33 | +```bash |
| 34 | +# 1. Extract job inventory from CI config YAML files |
| 35 | +python3 scripts/extract_openstack_jobs.py \ |
| 36 | + --config-dir $CONFIG_DIR \ |
| 37 | + --output-dir $OUTPUT_DIR \ |
| 38 | + --summary |
| 39 | + |
| 40 | +# 2. Fetch pass rates from Sippy API |
| 41 | +python3 scripts/fetch_job_metrics.py --output-dir $OUTPUT_DIR |
| 42 | + |
| 43 | +# 3. Calculate 14-day combined metrics |
| 44 | +python3 scripts/fetch_extended_metrics.py --output-dir $OUTPUT_DIR |
| 45 | + |
| 46 | +# 4. Fetch platform comparison data |
| 47 | +python3 scripts/fetch_comparison_data.py --output-dir $OUTPUT_DIR |
| 48 | +``` |
| 49 | + |
| 50 | +### Phase 2: Configuration Analysis |
| 51 | + |
| 52 | +These analyze the job inventory (can run in parallel): |
| 53 | + |
| 54 | +```bash |
| 55 | +python3 scripts/analyze_redundancy.py --output-dir $OUTPUT_DIR |
| 56 | +python3 scripts/analyze_coverage.py --output-dir $OUTPUT_DIR |
| 57 | +python3 scripts/analyze_triggers.py --output-dir $OUTPUT_DIR |
| 58 | +``` |
| 59 | + |
| 60 | +### Phase 3: Runtime Analysis |
| 61 | + |
| 62 | +These analyze Sippy metrics (can run in parallel): |
| 63 | + |
| 64 | +```bash |
| 65 | +python3 scripts/analyze_platform_comparison.py --output-dir $OUTPUT_DIR |
| 66 | +python3 scripts/analyze_workflow_passrate.py --output-dir $OUTPUT_DIR |
| 67 | +python3 scripts/categorize_failures.py --output-dir $OUTPUT_DIR |
| 68 | +``` |
| 69 | + |
| 70 | +## Scripts Reference |
| 71 | + |
| 72 | +| Script | Purpose | Requires | |
| 73 | +|--------|---------|----------| |
| 74 | +| `extract_openstack_jobs.py` | Extract jobs from ci-operator/config | config-dir | |
| 75 | +| `fetch_job_metrics.py` | Fetch Sippy API metrics | - | |
| 76 | +| `fetch_extended_metrics.py` | 14-day combined metrics | sippy_jobs_raw.json | |
| 77 | +| `fetch_comparison_data.py` | Platform comparison data | - | |
| 78 | +| `analyze_redundancy.py` | Find duplicate/overlapping jobs | inventory.json | |
| 79 | +| `analyze_coverage.py` | Find coverage gaps across releases | inventory.json | |
| 80 | +| `analyze_triggers.py` | Trigger optimization opportunities | inventory.json | |
| 81 | +| `analyze_platform_comparison.py` | OpenStack vs AWS/GCP/Azure | platform_comparison_raw.json | |
| 82 | +| `analyze_workflow_passrate.py` | Pass rates by workflow type | inventory.json, sippy_jobs_raw.json | |
| 83 | +| `categorize_failures.py` | Classify failures by root cause | extended_metrics_jobs.json | |
| 84 | + |
| 85 | +## Output Files |
| 86 | + |
| 87 | +### Reports (Markdown) |
| 88 | + |
| 89 | +| File | Contents | |
| 90 | +|------|----------| |
| 91 | +| `extended_metrics_report.md` | Overall health, trends, problem jobs | |
| 92 | +| `platform_comparison_report.md` | OpenStack vs other platforms | |
| 93 | +| `workflow_passrate_report.md` | Pass rates by workflow | |
| 94 | +| `failure_categories_report.md` | Failures by root cause | |
| 95 | +| `coverage_gaps_report.md` | Missing test coverage | |
| 96 | +| `trigger_optimization_report.md` | Trigger improvements | |
| 97 | +| `redundant_jobs_report.md` | Consolidation opportunities | |
| 98 | + |
| 99 | +### Data (JSON) |
| 100 | + |
| 101 | +| File | Contents | |
| 102 | +|------|----------| |
| 103 | +| `openstack_jobs_inventory.json` | Complete job inventory | |
| 104 | +| `sippy_jobs_raw.json` | Cached Sippy data | |
| 105 | +| `extended_metrics.json` | Combined metrics | |
| 106 | +| `platform_comparison_analysis.json` | Platform analysis | |
| 107 | +| `failure_categories.json` | Categorized failures | |
| 108 | + |
| 109 | +## Generating Executive Summary |
| 110 | + |
| 111 | +After running all scripts, extract key metrics: |
| 112 | + |
| 113 | +```python |
| 114 | +import json |
| 115 | +import os |
| 116 | + |
| 117 | +d = os.environ.get('OUTPUT_DIR', '.') |
| 118 | + |
| 119 | +ext = json.load(open(f'{d}/extended_metrics.json')) |
| 120 | +plat = json.load(open(f'{d}/platform_comparison_analysis.json')) |
| 121 | +fail = json.load(open(f'{d}/failure_categories.json')) |
| 122 | + |
| 123 | +print(f"Pass rate: {ext['overall']['combined_pass_rate']:.1f}%") |
| 124 | +print(f"Problem jobs: {ext['overall']['problem_job_count']}") |
| 125 | +print(f"OpenStack rank: #{plat['openstack_position']['rank']}/{plat['openstack_position']['total']}") |
| 126 | + |
| 127 | +print("\nFailure Categories:") |
| 128 | +for cat, count in fail['summary']['by_category'].items(): |
| 129 | + pct = fail['summary']['percentages'][cat] |
| 130 | + print(f" {cat}: {count} ({pct}%)") |
| 131 | +``` |
| 132 | + |
| 133 | +## Cluster Profiles Analyzed |
| 134 | + |
| 135 | +- openstack-vexxhost |
| 136 | +- openstack-vh-mecha-central |
| 137 | +- openstack-vh-mecha-az0 |
| 138 | +- openstack-vh-bm-rhos |
| 139 | +- openstack-hwoffload |
| 140 | +- openstack-nfv |
| 141 | + |
| 142 | +## Failure Categories |
| 143 | + |
| 144 | +| Category | Criteria | |
| 145 | +|----------|----------| |
| 146 | +| Infrastructure | Low pass rate on install/provision jobs | |
| 147 | +| Flaky | 30-70% pass rate (inconsistent) | |
| 148 | +| Product Bug | Low pass rate with bugs filed | |
| 149 | +| Needs Triage | Unknown cause, requires investigation | |
| 150 | + |
| 151 | +## Troubleshooting |
| 152 | + |
| 153 | +| Error | Solution | |
| 154 | +|-------|----------| |
| 155 | +| "No Sippy data found" | Run `fetch_job_metrics.py` first | |
| 156 | +| "No job inventory found" | Run `extract_openstack_jobs.py` first | |
| 157 | +| Import error for yaml | `pip install pyyaml` | |
| 158 | +| Config directory not found | Point to ci-operator/config in openshift/release repo | |
0 commit comments