Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,9 +192,9 @@ This pattern ensures proper encoding, timestamps, and file attributes are handle

## CI / Build Investigation

**dotnet/android's primary CI runs on Azure DevOps (internal), not GitHub Actions.** When a user asks about CI status, CI failures, why a PR is blocked, or build errors:
**dotnet/android PR validation runs on the public Azure DevOps `dotnet-android` pipeline on `dnceng-public`, not GitHub Actions.** When a user asks about CI status, CI failures, why a PR is blocked, or build errors:

1. **ALWAYS invoke the `ci-status` skill first** — do NOT rely on `gh pr checks` alone. GitHub checks may all show ✅ while the internal Azure DevOps build is failing.
1. **ALWAYS invoke the `ci-status` skill first.** The pipeline surfaces as ~39 `dotnet-android (...)` GitHub checks, but the skill adds build progress, ETA, per-stage failures, and failed-test names that `gh pr checks` alone doesn't give you.
2. The skill auto-detects the current PR from the git branch when no PR number is given.
3. For deep .binlog analysis, use the `azdo-build-investigator` skill.
4. Only after the skill confirms no Azure DevOps failures should you report CI as passing.
Expand Down
1 change: 0 additions & 1 deletion .github/skills/android-reviewer/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ Review the CI results. **Never post ✅ LGTM if any required CI check is failing
- Investigate the failure using the **azdo-build-investigator** skill (for Azure DevOps pipeline failures) or GitHub Actions job logs.
- If the failure is caused by the PR's code changes, flag it as ❌ error.
- If the failure is a known infrastructure issue or pre-existing flake unrelated to the PR, note it in the summary but still use ⚠️ Needs Changes — the PR isn't mergeable until CI is green.
- If **all public CI checks pass** but only the internal `Xamarin.Android-PR` check is failing, still use ⚠️ Needs Changes with a note that the internal pipeline may need a re-run. Do not give ✅ LGTM.
- If the PR description acknowledges the failure and documents a dependency (e.g., "blocked on X"), note it in the summary.

### 5. Load review rules
Expand Down
355 changes: 124 additions & 231 deletions .github/skills/ci-status/SKILL.md

Large diffs are not rendered by default.

88 changes: 88 additions & 0 deletions .github/skills/ci-status/references/azdo-queries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# AZDO queries (dnceng-public)

Deeper `az` commands for the `dotnet-android` build, beyond the core ones in SKILL.md. Shared setup:

```bash
ORG=https://dev.azure.com/dnceng-public; PROJECT=public
RES=499b84ac-1321-427f-aa17-267ca6975798 # Azure DevOps app id, for `az rest --resource`
```

`build`-area `az devops invoke` works unauthenticated; the `test` area is broken (404) so the test data goes through `az rest`; `az rest` and artifact/log downloads need `az login`.

## ETA for an in-progress build

Duration is dominated by hosted-agent queue time (same ~38 jobs every run, yet ~50 min to ~3 h+). Pull recent green runs of def `333`, take the **median** duration, `ETA = startTime + median`; present it as a rough window.

```bash
az devops invoke --area build --resource builds --org $ORG \
--route-parameters project=$PROJECT \
--query-parameters "definitions=333&statusFilter=completed&resultFilter=succeeded&\$top=10" \
--query "value[].{start:startTime, finish:finishTime}" -o json
```

## Failed-test error message / stack trace

`ResultsByBuild` (SKILL.md) gives the names + `runId`. For messages, list the run's failed results — the single-result-by-`testId` route returns null here. Repeat per distinct `runId`:

```bash
az devops invoke --area test --resource results --org $ORG \
--route-parameters project=$PROJECT runId=$RUN_ID \
--query-parameters "outcomes=Failed&\$top=20" \
--query "value[].{test:testCaseTitle, error:errorMessage, stack:stackTrace}" -o json
```

## Per-flavor test breakdown — fields & run → job mapping

The breakdown in SKILL.md fetches `/tmp/runs.json` from `/_apis/test/runs?...&includeRunDetails=true`. Field meanings per run (one run = one test *flavor*, e.g. `Mono.Android.NET_Tests-NativeAOT`):

| Field | Source | Meaning |
|-------|--------|---------|
| `total` | `totalTests` | all tests in the run |
| `passed` | `passedTests` | passed |
| `failed` | `unanalyzedTests` | failed/aborted |
| `skipped` | `notApplicableTests` | skipped / inconclusive |
| `phase` | `pipelineReference.phaseReference.phaseName` | the pipeline phase the run belongs to |

`run.phase` equals a timeline **Phase** record's `refName`; that record's `name` is the human lane — e.g. `mac_apk_tests_net_2` → `macOS > Tests > APKs 2`. That join (`runs` × timeline phases) is what the breakdown `jq` does. **Matrix lanes that share one phase** (e.g. all `MSBuild+Emulator N` jobs are phase `mac_dotnetdevice_tests`) aggregate into a single breakdown block — use the per-job timing table to see which numbered job actually failed/timed out.

Quick per-run counts without the join:

```bash
az rest --method get --resource $RES \
--url "$ORG/$PROJECT/_apis/test/runs?buildUri=vstfs:///Build/Build/$BUILD_ID&api-version=7.1&includeRunDetails=true" \
--query "value[].{name:name, total:totalTests, passed:passedTests, failed:unanalyzedTests, skipped:notApplicableTests}" -o json
```

To enrich the breakdown with the **actual error message** under each failed test, replace `/tmp/failed.json` with per-run results that include `errorMessage` (the "Failed-test error message" query above) — key them by `runId` the same way the breakdown's `$ft` lookup does.

## Fetch a failed task's log

Take `log.id` from a `records[?result=='failed']` timeline entry, then (works unauthenticated via `az rest`):

```bash
az rest --method get --resource $RES \
--url "$ORG/$PROJECT/_apis/build/builds/$BUILD_ID/logs/$LOG_ID?api-version=7.1" --output-file "/tmp/azdo-$LOG_ID.log"
```

The per-flavor `run <flavor>` task log holds the MTP summary (`Test run summary: Zero tests ran` ⇒ the app crashed at startup); the per-test lifecycle and native crash are **not** here — they are in logcat (below).

## Crash culprit from logcat

`scripts/ci_failures.py` flags crashed/incomplete/timed-out lanes, but the culprit test is only in the device **logcat**, published inside that lane's `Test Results - ...` build artifact (100 MB–2 GB — prefer the smaller `Debug` lane). Download it, then scan `logcat-<flavor>.txt`:

```bash
# list artifacts + sizes to pick the failing lane:
az rest --method get --resource $RES \
--url "$ORG/$PROJECT/_apis/build/builds/$BUILD_ID/artifacts?api-version=7.1" \
--query "value[].{name:name, mb:(resource.properties.artifactsize)}" -o json

az pipelines runs artifact download --run-id $BUILD_ID --org $ORG --project $PROJECT \
--artifact-name "Test Results - APKs .NET Debug - macOS 1" --path /tmp/cilogs

# The crasher is the LAST test that logged a start with no matching pass/fail,
# usually right before a native signal:
grep -nE 'Running |\[PASS\]|\[FAIL\]|SIGSEGV|SIGABRT|tombstone|FATAL|art::|JNI DETECTED|Process .* died' \
/tmp/cilogs/**/logcat-*.txt | tail -60
```

For a `Zero tests ran` lane the crash is at app startup (look for the first `SIGSEGV`/`tombstone`/`JNI DETECTED ERROR`, not a specific test); for a timeout the suspect is the last `Running <test>` with no result.
6 changes: 5 additions & 1 deletion .github/skills/ci-status/references/binlog-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,11 @@ az pipelines runs artifact list --run-id $BUILD_ID --org $ORG_URL --project $PRO
az pipelines runs artifact list --run-id $BUILD_ID --org $ORG_URL --project $PROJECT --output json
```

Look for artifact names containing `binlog`, `msbuild`, or `build-log`.
Look for artifact names that contain build logs. On the `dotnet-android` (dnceng-public) pipeline the relevant ones are:
- `Build Results - macOS` / `Build Results - Windows` / `Build Results - Linux` — contain the `.binlog` files (published mainly when a build stage fails or when `XA.PublishAllLogs` is set).
- `Test Results - ...` — per-test-stage logs and artifacts. For the on-device `Package Tests` (APKs) stage these also include each device test's `build-<testName>.binlog`, `run-<testName>.binlog`, the `.trx`, and `logcat-<testName>.txt` (essential for native/JNI crash diagnosis).

If a green build has no `Build Results - *` artifact, the binlogs weren't published; re-run with `XA.PublishAllLogs` or rely on the timeline/test queries instead.

### Download

Expand Down
2 changes: 1 addition & 1 deletion .github/skills/ci-status/references/error-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ These are CI environment issues, not code problems.
| Network | `Unable to load the service index`, `Connection refused` |
| NuGet feed | `NU1301` (feed connectivity) |
| Agent issues | `The agent did not connect`, `##[error] The job was canceled` |
| Timeout (job-level) | Job canceled after 55+ minutes |
| Timeout (job-level) | `result: canceled` + `issues[]` says *"ran longer than the maximum time of N minutes"* |

## Decision Tree

Expand Down
213 changes: 213 additions & 0 deletions .github/skills/ci-status/scripts/ci_failures.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
#!/usr/bin/env python3
"""Enriched failure analysis for one dnceng-public `dotnet-android` build:
1. cross-config matrix per failed test (failed/passed/retried configs) + stack/asserts
2. crashed / incomplete lanes (started-but-not-finished culprit lives in logcat)
3. branch cross-reference (PR changes that name a failing test's class/namespace/assembly)

Needs `az login`. Usage: ci_failures.py --build-id N [--pr N] [--repo dotnet/android]
"""
import json, subprocess, sys, argparse, re
from collections import defaultdict
from concurrent.futures import ThreadPoolExecutor

ORG = "https://dev.azure.com/dnceng-public"
PROJECT = "public"
RES = "499b84ac-1321-427f-aa17-267ca6975798"


def az_json(url):
p = subprocess.run(["az", "rest", "--method", "get", "--resource", RES,
"--url", url, "-o", "json"], capture_output=True, text=True)
if p.returncode != 0:
sys.stderr.write(f"az error {url}\n{p.stderr[:300]}\n")
return None
try:
return json.loads(p.stdout)
except json.JSONDecodeError:
return None


def run_results(rid):
data = az_json(f"{ORG}/{PROJECT}/_apis/test/Runs/{rid}/results?api-version=7.1&$top=5000")
out = {}
for row in (data or {}).get("value", []):
n = row.get("automatedTestName")
if n:
out[n] = (row.get("outcome"), row.get("errorMessage"), row.get("stackTrace"))
return rid, out


def fetch_all(rids, workers=6):
if not rids:
return {}
with ThreadPoolExecutor(max_workers=workers) as ex:
return dict(ex.map(run_results, rids))


def base_of(name):
"""Strip flavor/OS/index suffix so sibling configs share one base.
'Mono.Android.NET_Tests-NativeAOT' -> 'Mono.Android.NET_Tests';
'Xamarin.Android.Build.Tests - macOS-7' -> 'Xamarin.Android.Build.Tests'."""
b = re.sub(r' - (macOS|Windows|Linux)(-\d+)?$', '', name)
b = re.sub(r'-[A-Za-z0-9]+$', '', b)
return b


# ---------------- section 1: cross-config matrix ----------------
def section_matrix(bid, failed, runs, run_by_id):
fail_runs, storage = defaultdict(set), {}
for f in failed:
fail_runs[f["automatedTestName"]].add(f["runId"])
storage[f["automatedTestName"]] = f.get("automatedTestStorage")

def first_base(rids):
for r in rids:
if r in run_by_id:
return base_of(run_by_id[r]["name"])
return ""
fam = {n: first_base(rids) for n, rids in fail_runs.items()}
cand = defaultdict(list)
for fk in set(fam.values()):
for r in runs:
if base_of(r["name"]) == fk:
cand[fk].append(r)
cache = fetch_all(list({r["id"] for fk in fam.values() for r in cand[fk]}))

print(f"## Failed-test cross-config matrix — {len(fail_runs)} distinct test(s)\n")
for n in sorted(fail_runs):
fk = fam[n]
cfg = defaultdict(list)
for r in cand[fk]:
row = cache.get(r["id"], {}).get(n)
if row:
cfg[r["name"]].append((r.get("completedDate") or "", row[0]))
short, ns = n.rsplit(".", 1)[-1], n.rsplit(".", 1)[0]
print(f"### `{short}` ({ns})")
print(f"- assembly `{storage.get(n)}` · family `{fk}`")
fl, pa, ot = [], [], []
for name in sorted(cfg):
outs = [o for _, o in sorted(cfg[name])]
label = name[len(fk):].lstrip(" -") or name
disp = "->".join(outs) + " (retry)" if len(set(outs)) > 1 else outs[0]
(fl if "Failed" in outs else pa if set(outs) == {"Passed"} else ot).append(
f"`{label}`" + ("" if disp == "Passed" else f" ({disp})"))
print(f"- FAILED in: {', '.join(fl) or '-'}")
print(f"- passed in: {', '.join(pa) or '-'}")
if ot:
print(f"- other: {', '.join(ot)}")
for rid in fail_runs[n]:
row = cache.get(rid, {}).get(n)
if row and row[1]:
print(f"- assert/error: {row[1].strip().splitlines()[0][:300]}")
if row[2]:
print(" ```")
for ln in row[2].strip().splitlines()[:6]:
print(" " + ln[:200])
print(" ```")
break
print()


# ---------------- section 2: crashed / incomplete lanes ----------------
def section_crashes(bid, runs, timeline):
recs = timeline.get("records", [])
published = {r["name"]: r for r in runs}
crashed = []
# incomplete test runs (runner died mid-run)
for r in runs:
inc = r.get("incompleteTests") or 0
if inc > 0:
crashed.append((r["name"], f"{inc} test(s) did not complete - runner died mid-run"))
# "run <flavor>" tasks that did not cleanly succeed AND published no (complete) results = crash/zero-tests
for rec in recs:
if rec.get("type") == "Task" and (rec.get("name") or "").startswith("run ") \
and rec.get("result") in ("failed", "succeededWithIssues", "canceled"):
flavor = rec["name"][4:].strip()
run = published.get(flavor)
if run is None or (run.get("incompleteTests") or 0) > 0:
crashed.append((flavor, f"`run` task {rec['result']} but no complete test run published - app likely crashed ('Zero tests ran' / native crash)"))
# job-level timeouts (hang)
for rec in recs:
if rec.get("type") == "Job" and rec.get("result") == "canceled":
msg = " ".join(i.get("message", "") for i in (rec.get("issues") or []))
m = re.search(r"maximum time of (\d+) minutes", msg)
if m:
crashed.append((rec["name"], f"timed out at {m.group(1)}-min cap - likely a hung test; last started test in logcat is the suspect"))
if not crashed:
return
print("## Crashed / incomplete lanes (!)\n")
print("These went red with **no usable failed-test list** - the culprit (a test that **started but never "
"finished**, or a native crash) is only in the device **logcat**, not the test API:\n")
seen = set()
for name, why in crashed:
if (name, why) in seen:
continue
seen.add((name, why))
print(f"- **{name}** - {why}")
print()
print("To name the culprit, download that lane's logs artifact (large: 100MB-2GB - prefer the `Debug` lane) "
"and scan its logcat (see references/azdo-queries.md):\n")
print("```bash")
print(f'az pipelines runs artifact download --run-id {bid} --org {ORG} --project {PROJECT} \\')
print(' --artifact-name "Test Results - APKs .NET Debug - macOS 1" --path /tmp/cilogs')
print(r"grep -nE 'Running |\[PASS\]|\[FAIL\]|SIGSEGV|SIGABRT|tombstone|FATAL|art::|JNI DETECTED|Process .*died' \\")
print(' /tmp/cilogs/**/logcat-*.txt | tail -60 # last test that STARTED with no PASS/FAIL = crasher')
print("```\n")
Comment on lines +148 to +155


# ---------------- section 3: branch cross-reference ----------------
def section_xref(failed, repo, pr):
names = sorted({f["automatedTestName"] for f in failed})
if not names:
return
p = subprocess.run(["gh", "pr", "diff", str(pr), "--repo", repo, "--name-only"],
capture_output=True, text=True)
if p.returncode != 0:
sys.stderr.write(f"gh diff failed: {p.stderr[:200]}\n")
return
files = [f for f in p.stdout.splitlines() if f.strip()]
stems = {f.rsplit("/", 1)[-1].rsplit(".", 1)[0]: f for f in files}
print("## Branch cross-reference\n")
print(f"PR #{pr} changes {len(files)} file(s). Name overlaps with failing tests (judge if causal):\n")
any_hit = False
for n in names:
parts = n.split(".")
cls = parts[-2] if len(parts) >= 2 else ""
method, ns = parts[-1], ".".join(parts[:-2])
hits = set()
for stem, path in stems.items():
if stem and (stem == cls or stem == method or stem in ns.split(".") or (cls and cls in path)):
hits.add(path)
if hits:
any_hit = True
print(f"- `{cls}.{method}` <- {', '.join('`'+h+'`' for h in sorted(hits)[:5])}")
if not any_hit:
print("- No direct file-name overlap. Check whether changed runtime/build code affects the failing assembly.")
print()


def main():
ap = argparse.ArgumentParser()
ap.add_argument("--build-id", required=True)
ap.add_argument("--pr")
ap.add_argument("--repo", default="dotnet/android")
args = ap.parse_args()
bid = args.build_id

failed = (az_json(f"{ORG}/{PROJECT}/_apis/test/ResultsByBuild?buildId={bid}&outcomes=Failed&api-version=7.1-preview") or {}).get("value", [])
runs = (az_json(f"{ORG}/{PROJECT}/_apis/test/runs?buildUri=vstfs:///Build/Build/{bid}&api-version=7.1&includeRunDetails=true") or {}).get("value", [])
timeline = az_json(f"{ORG}/{PROJECT}/_apis/build/builds/{bid}/timeline?api-version=7.1") or {}
run_by_id = {r["id"]: r for r in runs}

print(f"# Failure analysis - build {bid}\n")
if failed:
section_matrix(bid, failed, runs, run_by_id)
else:
print("_No failed tests in the test API (build may still be red via crash/timeout below)._\n")
section_crashes(bid, runs, timeline)
if args.pr:
section_xref(failed, args.repo, args.pr)


if __name__ == "__main__":
main()
Loading