dotnet · simonrozsival · Jun 16, 2026 · Jun 16, 2026 · Jun 16, 2026 · Jun 17, 2026
@@ -192,9 +192,9 @@ This pattern ensures proper encoding, timestamps, and file attributes are handle
 
 ## CI / Build Investigation
 
-**dotnet/android's primary CI runs on Azure DevOps (internal), not GitHub Actions.** When a user asks about CI status, CI failures, why a PR is blocked, or build errors:
+**dotnet/android PR validation runs on the public Azure DevOps `dotnet-android` pipeline on `dnceng-public`, not GitHub Actions.** When a user asks about CI status, CI failures, why a PR is blocked, or build errors:
 
-1. **ALWAYS invoke the `ci-status` skill first** — do NOT rely on `gh pr checks` alone. GitHub checks may all show ✅ while the internal Azure DevOps build is failing.
+1. **ALWAYS invoke the `ci-status` skill first.** The pipeline surfaces as ~39 `dotnet-android (...)` GitHub checks, but the skill adds build progress, ETA, per-stage failures, and failed-test names that `gh pr checks` alone doesn't give you.
 2. The skill auto-detects the current PR from the git branch when no PR number is given.
 3. For deep .binlog analysis, use the `azdo-build-investigator` skill.
 4. Only after the skill confirms no Azure DevOps failures should you report CI as passing.

@@ -57,7 +57,6 @@ Review the CI results. **Never post ✅ LGTM if any required CI check is failing
 - Investigate the failure using the **azdo-build-investigator** skill (for Azure DevOps pipeline failures) or GitHub Actions job logs.
 - If the failure is caused by the PR's code changes, flag it as ❌ error.
 - If the failure is a known infrastructure issue or pre-existing flake unrelated to the PR, note it in the summary but still use ⚠️ Needs Changes — the PR isn't mergeable until CI is green.
-- If **all public CI checks pass** but only the internal `Xamarin.Android-PR` check is failing, still use ⚠️ Needs Changes with a note that the internal pipeline may need a re-run. Do not give ✅ LGTM.
 - If the PR description acknowledges the failure and documents a dependency (e.g., "blocked on X"), note it in the summary.
 
 ### 5. Load review rules

@@ -0,0 +1,88 @@
+# AZDO queries (dnceng-public)
+
+Deeper `az` commands for the `dotnet-android` build, beyond the core ones in SKILL.md. Shared setup:
+
+```bash
+ORG=https://dev.azure.com/dnceng-public; PROJECT=public
+RES=499b84ac-1321-427f-aa17-267ca6975798   # Azure DevOps app id, for `az rest --resource`
+```
+
+`build`-area `az devops invoke` works unauthenticated; the `test` area is broken (404) so the test data goes through `az rest`; `az rest` and artifact/log downloads need `az login`.
+
+## ETA for an in-progress build
+
+Duration is dominated by hosted-agent queue time (same ~38 jobs every run, yet ~50 min to ~3 h+). Pull recent green runs of def `333`, take the **median** duration, `ETA = startTime + median`; present it as a rough window.
+
+```bash
+az devops invoke --area build --resource builds --org $ORG \
+  --route-parameters project=$PROJECT \
+  --query-parameters "definitions=333&statusFilter=completed&resultFilter=succeeded&\$top=10" \
+  --query "value[].{start:startTime, finish:finishTime}" -o json
+```
+
+## Failed-test error message / stack trace
+
+`ResultsByBuild` (SKILL.md) gives the names + `runId`. For messages, list the run's failed results — the single-result-by-`testId` route returns null here. Repeat per distinct `runId`:
+
+```bash
+az devops invoke --area test --resource results --org $ORG \
+  --route-parameters project=$PROJECT runId=$RUN_ID \
+  --query-parameters "outcomes=Failed&\$top=20" \
+  --query "value[].{test:testCaseTitle, error:errorMessage, stack:stackTrace}" -o json
+```
+
+## Per-flavor test breakdown — fields & run → job mapping
+
+The breakdown in SKILL.md fetches `/tmp/runs.json` from `/_apis/test/runs?...&includeRunDetails=true`. Field meanings per run (one run = one test *flavor*, e.g. `Mono.Android.NET_Tests-NativeAOT`):
+
+| Field | Source | Meaning |
+|-------|--------|---------|
+| `total` | `totalTests` | all tests in the run |
+| `passed` | `passedTests` | passed |
+| `failed` | `unanalyzedTests` | failed/aborted |
+| `skipped` | `notApplicableTests` | skipped / inconclusive |
+| `phase` | `pipelineReference.phaseReference.phaseName` | the pipeline phase the run belongs to |
+
+`run.phase` equals a timeline **Phase** record's `refName`; that record's `name` is the human lane — e.g. `mac_apk_tests_net_2` → `macOS > Tests > APKs 2`. That join (`runs` × timeline phases) is what the breakdown `jq` does. **Matrix lanes that share one phase** (e.g. all `MSBuild+Emulator N` jobs are phase `mac_dotnetdevice_tests`) aggregate into a single breakdown block — use the per-job timing table to see which numbered job actually failed/timed out.
+
+Quick per-run counts without the join:
+
+```bash
+az rest --method get --resource $RES \
+  --url "$ORG/$PROJECT/_apis/test/runs?buildUri=vstfs:///Build/Build/$BUILD_ID&api-version=7.1&includeRunDetails=true" \
+  --query "value[].{name:name, total:totalTests, passed:passedTests, failed:unanalyzedTests, skipped:notApplicableTests}" -o json
+```
+
+To enrich the breakdown with the **actual error message** under each failed test, replace `/tmp/failed.json` with per-run results that include `errorMessage` (the "Failed-test error message" query above) — key them by `runId` the same way the breakdown's `$ft` lookup does.
+
+## Fetch a failed task's log
+
+Take `log.id` from a `records[?result=='failed']` timeline entry, then (works unauthenticated via `az rest`):
+
+```bash
+az rest --method get --resource $RES \
+  --url "$ORG/$PROJECT/_apis/build/builds/$BUILD_ID/logs/$LOG_ID?api-version=7.1" --output-file "/tmp/azdo-$LOG_ID.log"
+```
+
+The per-flavor `run <flavor>` task log holds the MTP summary (`Test run summary: Zero tests ran` ⇒ the app crashed at startup); the per-test lifecycle and native crash are **not** here — they are in logcat (below).
+
+## Crash culprit from logcat
+
+`scripts/ci_failures.py` flags crashed/incomplete/timed-out lanes, but the culprit test is only in the device **logcat**, published inside that lane's `Test Results - ...` build artifact (100 MB–2 GB — prefer the smaller `Debug` lane). Download it, then scan `logcat-<flavor>.txt`:
+
+```bash
+# list artifacts + sizes to pick the failing lane:
+az rest --method get --resource $RES \
+  --url "$ORG/$PROJECT/_apis/build/builds/$BUILD_ID/artifacts?api-version=7.1" \
+  --query "value[].{name:name, mb:(resource.properties.artifactsize)}" -o json
+
+az pipelines runs artifact download --run-id $BUILD_ID --org $ORG --project $PROJECT \
+  --artifact-name "Test Results - APKs .NET Debug - macOS 1" --path /tmp/cilogs
+
+# The crasher is the LAST test that logged a start with no matching pass/fail,
+# usually right before a native signal:
+grep -nE 'Running |\[PASS\]|\[FAIL\]|SIGSEGV|SIGABRT|tombstone|FATAL|art::|JNI DETECTED|Process .* died' \
+  /tmp/cilogs/**/logcat-*.txt | tail -60
+```
+
+For a `Zero tests ran` lane the crash is at app startup (look for the first `SIGSEGV`/`tombstone`/`JNI DETECTED ERROR`, not a specific test); for a timeout the suspect is the last `Running <test>` with no result.
@@ -20,7 +20,11 @@ az pipelines runs artifact list --run-id $BUILD_ID --org $ORG_URL --project $PRO
 az pipelines runs artifact list --run-id $BUILD_ID --org $ORG_URL --project $PROJECT --output json
 ```
 
-Look for artifact names containing `binlog`, `msbuild`, or `build-log`.
+Look for artifact names that contain build logs. On the `dotnet-android` (dnceng-public) pipeline the relevant ones are:
+- `Build Results - macOS` / `Build Results - Windows` / `Build Results - Linux` — contain the `.binlog` files (published mainly when a build stage fails or when `XA.PublishAllLogs` is set).
+- `Test Results - ...` — per-test-stage logs and artifacts. For the on-device `Package Tests` (APKs) stage these also include each device test's `build-<testName>.binlog`, `run-<testName>.binlog`, the `.trx`, and `logcat-<testName>.txt` (essential for native/JNI crash diagnosis).
+
+If a green build has no `Build Results - *` artifact, the binlogs weren't published; re-run with `XA.PublishAllLogs` or rely on the timeline/test queries instead.
 
 ### Download
 

@@ -36,7 +36,7 @@ These are CI environment issues, not code problems.
 | Network | `Unable to load the service index`, `Connection refused` |
 | NuGet feed | `NU1301` (feed connectivity) |
 | Agent issues | `The agent did not connect`, `##[error] The job was canceled` |
-| Timeout (job-level) | Job canceled after 55+ minutes |
+| Timeout (job-level) | `result: canceled` + `issues[]` says *"ran longer than the maximum time of N minutes"* |
 
 ## Decision Tree
 

@@ -0,0 +1,213 @@
+#!/usr/bin/env python3
+"""Enriched failure analysis for one dnceng-public `dotnet-android` build:
+  1. cross-config matrix per failed test (failed/passed/retried configs) + stack/asserts
+  2. crashed / incomplete lanes (started-but-not-finished culprit lives in logcat)
+  3. branch cross-reference (PR changes that name a failing test's class/namespace/assembly)
+
+Needs `az login`. Usage: ci_failures.py --build-id N [--pr N] [--repo dotnet/android]
+"""
+import json, subprocess, sys, argparse, re
+from collections import defaultdict
+from concurrent.futures import ThreadPoolExecutor
+
+ORG = "https://dev.azure.com/dnceng-public"
+PROJECT = "public"
+RES = "499b84ac-1321-427f-aa17-267ca6975798"
+
+
+def az_json(url):
+    p = subprocess.run(["az", "rest", "--method", "get", "--resource", RES,
+                        "--url", url, "-o", "json"], capture_output=True, text=True)
+    if p.returncode != 0:
+        sys.stderr.write(f"az error {url}\n{p.stderr[:300]}\n")
+        return None
+    try:
+        return json.loads(p.stdout)
+    except json.JSONDecodeError:
+        return None
+
+
+def run_results(rid):
+    data = az_json(f"{ORG}/{PROJECT}/_apis/test/Runs/{rid}/results?api-version=7.1&$top=5000")
+    out = {}
+    for row in (data or {}).get("value", []):
+        n = row.get("automatedTestName")
+        if n:
+            out[n] = (row.get("outcome"), row.get("errorMessage"), row.get("stackTrace"))
+    return rid, out
+
+
+def fetch_all(rids, workers=6):
+    if not rids:
+        return {}
+    with ThreadPoolExecutor(max_workers=workers) as ex:
+        return dict(ex.map(run_results, rids))
+
+
+def base_of(name):
+    """Strip flavor/OS/index suffix so sibling configs share one base.
+    'Mono.Android.NET_Tests-NativeAOT' -> 'Mono.Android.NET_Tests';
+    'Xamarin.Android.Build.Tests - macOS-7' -> 'Xamarin.Android.Build.Tests'."""
+    b = re.sub(r' - (macOS|Windows|Linux)(-\d+)?$', '', name)
+    b = re.sub(r'-[A-Za-z0-9]+$', '', b)
+    return b
+
+
+# ---------------- section 1: cross-config matrix ----------------
+def section_matrix(bid, failed, runs, run_by_id):
+    fail_runs, storage = defaultdict(set), {}
+    for f in failed:
+        fail_runs[f["automatedTestName"]].add(f["runId"])
+        storage[f["automatedTestName"]] = f.get("automatedTestStorage")
+
+    def first_base(rids):
+        for r in rids:
+            if r in run_by_id:
+                return base_of(run_by_id[r]["name"])
+        return ""
+    fam = {n: first_base(rids) for n, rids in fail_runs.items()}
+    cand = defaultdict(list)
+    for fk in set(fam.values()):
+        for r in runs:
+            if base_of(r["name"]) == fk:
+                cand[fk].append(r)
+    cache = fetch_all(list({r["id"] for fk in fam.values() for r in cand[fk]}))
+
+    print(f"## Failed-test cross-config matrix — {len(fail_runs)} distinct test(s)\n")
+    for n in sorted(fail_runs):
+        fk = fam[n]
+        cfg = defaultdict(list)
+        for r in cand[fk]:
+            row = cache.get(r["id"], {}).get(n)
+            if row:
+                cfg[r["name"]].append((r.get("completedDate") or "", row[0]))
+        short, ns = n.rsplit(".", 1)[-1], n.rsplit(".", 1)[0]
+        print(f"### `{short}`  ({ns})")
+        print(f"- assembly `{storage.get(n)}` · family `{fk}`")
+        fl, pa, ot = [], [], []
+        for name in sorted(cfg):
+            outs = [o for _, o in sorted(cfg[name])]
+            label = name[len(fk):].lstrip(" -") or name
+            disp = "->".join(outs) + " (retry)" if len(set(outs)) > 1 else outs[0]
+            (fl if "Failed" in outs else pa if set(outs) == {"Passed"} else ot).append(
+                f"`{label}`" + ("" if disp == "Passed" else f" ({disp})"))
+        print(f"- FAILED in: {', '.join(fl) or '-'}")
+        print(f"- passed in: {', '.join(pa) or '-'}")
+        if ot:
+            print(f"- other: {', '.join(ot)}")
+        for rid in fail_runs[n]:
+            row = cache.get(rid, {}).get(n)
+            if row and row[1]:
+                print(f"- assert/error: {row[1].strip().splitlines()[0][:300]}")
+                if row[2]:
+                    print("  ```")
+                    for ln in row[2].strip().splitlines()[:6]:
+                        print("  " + ln[:200])
+                    print("  ```")
+                break
+        print()
+
+
+# ---------------- section 2: crashed / incomplete lanes ----------------
+def section_crashes(bid, runs, timeline):
+    recs = timeline.get("records", [])
+    published = {r["name"]: r for r in runs}
+    crashed = []
+    # incomplete test runs (runner died mid-run)
+    for r in runs:
+        inc = r.get("incompleteTests") or 0
+        if inc > 0:
+            crashed.append((r["name"], f"{inc} test(s) did not complete - runner died mid-run"))
+    # "run <flavor>" tasks that did not cleanly succeed AND published no (complete) results = crash/zero-tests
+    for rec in recs:
+        if rec.get("type") == "Task" and (rec.get("name") or "").startswith("run ") \
+                and rec.get("result") in ("failed", "succeededWithIssues", "canceled"):
+            flavor = rec["name"][4:].strip()
+            run = published.get(flavor)
+            if run is None or (run.get("incompleteTests") or 0) > 0:
+                crashed.append((flavor, f"`run` task {rec['result']} but no complete test run published - app likely crashed ('Zero tests ran' / native crash)"))
+    # job-level timeouts (hang)
+    for rec in recs:
+        if rec.get("type") == "Job" and rec.get("result") == "canceled":
+            msg = " ".join(i.get("message", "") for i in (rec.get("issues") or []))
+            m = re.search(r"maximum time of (\d+) minutes", msg)
+            if m:
+                crashed.append((rec["name"], f"timed out at {m.group(1)}-min cap - likely a hung test; last started test in logcat is the suspect"))
+    if not crashed:
+        return
+    print("## Crashed / incomplete lanes  (!)\n")
+    print("These went red with **no usable failed-test list** - the culprit (a test that **started but never "
+          "finished**, or a native crash) is only in the device **logcat**, not the test API:\n")
+    seen = set()
+    for name, why in crashed:
+        if (name, why) in seen:
+            continue
+        seen.add((name, why))
+        print(f"- **{name}** - {why}")
+    print()
+    print("To name the culprit, download that lane's logs artifact (large: 100MB-2GB - prefer the `Debug` lane) "
+          "and scan its logcat (see references/azdo-queries.md):\n")
+    print("```bash")
+    print(f'az pipelines runs artifact download --run-id {bid} --org {ORG} --project {PROJECT} \\')
+    print('  --artifact-name "Test Results - APKs .NET Debug - macOS 1" --path /tmp/cilogs')
+    print(r"grep -nE 'Running |\[PASS\]|\[FAIL\]|SIGSEGV|SIGABRT|tombstone|FATAL|art::|JNI DETECTED|Process .*died' \\")
+    print('  /tmp/cilogs/**/logcat-*.txt | tail -60   # last test that STARTED with no PASS/FAIL = crasher')
+    print("```\n")
+
+
+# ---------------- section 3: branch cross-reference ----------------
+def section_xref(failed, repo, pr):
+    names = sorted({f["automatedTestName"] for f in failed})
+    if not names:
+        return
+    p = subprocess.run(["gh", "pr", "diff", str(pr), "--repo", repo, "--name-only"],
+                       capture_output=True, text=True)
+    if p.returncode != 0:
+        sys.stderr.write(f"gh diff failed: {p.stderr[:200]}\n")
+        return
+    files = [f for f in p.stdout.splitlines() if f.strip()]
+    stems = {f.rsplit("/", 1)[-1].rsplit(".", 1)[0]: f for f in files}
+    print("## Branch cross-reference\n")
+    print(f"PR #{pr} changes {len(files)} file(s). Name overlaps with failing tests (judge if causal):\n")
+    any_hit = False
+    for n in names:
+        parts = n.split(".")
+        cls = parts[-2] if len(parts) >= 2 else ""
+        method, ns = parts[-1], ".".join(parts[:-2])
+        hits = set()
+        for stem, path in stems.items():
+            if stem and (stem == cls or stem == method or stem in ns.split(".") or (cls and cls in path)):
+                hits.add(path)
+        if hits:
+            any_hit = True
+            print(f"- `{cls}.{method}` <- {', '.join('`'+h+'`' for h in sorted(hits)[:5])}")
+    if not any_hit:
+        print("- No direct file-name overlap. Check whether changed runtime/build code affects the failing assembly.")
+    print()
+
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--build-id", required=True)
+    ap.add_argument("--pr")
+    ap.add_argument("--repo", default="dotnet/android")
+    args = ap.parse_args()
+    bid = args.build_id
+
+    failed = (az_json(f"{ORG}/{PROJECT}/_apis/test/ResultsByBuild?buildId={bid}&outcomes=Failed&api-version=7.1-preview") or {}).get("value", [])
+    runs = (az_json(f"{ORG}/{PROJECT}/_apis/test/runs?buildUri=vstfs:///Build/Build/{bid}&api-version=7.1&includeRunDetails=true") or {}).get("value", [])
+    timeline = az_json(f"{ORG}/{PROJECT}/_apis/build/builds/{bid}/timeline?api-version=7.1") or {}
+    run_by_id = {r["id"]: r for r in runs}
+
+    print(f"# Failure analysis - build {bid}\n")
+    if failed:
+        section_matrix(bid, failed, runs, run_by_id)
+    else:
+        print("_No failed tests in the test API (build may still be red via crash/timeout below)._\n")
+    section_crashes(bid, runs, timeline)
+    if args.pr:
+        section_xref(failed, args.repo, args.pr)
+
+
+if __name__ == "__main__":
+    main()