From 758fa7ba5732d824181631f9deaca97a14c9ebd7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gustavo=20Andr=C3=A9=20dos=20Santos=20Lopes?=
 <gustavo.lopes@datadoghq.com>
Date: Thu, 4 Jun 2026 11:26:34 +0200
Subject: [PATCH 1/2] Move check ci procedure to new /check-ci skill

---
 .claude/ci/index.md              | 134 ++++---------------------
 .claude/skills/check-ci/SKILL.md | 164 +++++++++++++++++++++++++++++++
 2 files changed, 183 insertions(+), 115 deletions(-)
 create mode 100644 .claude/skills/check-ci/SKILL.md
diff --git a/.claude/ci/index.md b/.claude/ci/index.md
index 55e3160823c..1a9a9c6b482 100644
--- a/.claude/ci/index.md
+++ b/.claude/ci/index.md
@@ -181,121 +181,25 @@ curl -s -H "PRIVATE-TOKEN: $GITLAB_PERSONAL_ACCESS_TOKEN" \
 
 ### Checking CI (Gitlab)
 
-Use `.claude/ci/check-ci` to follow a pipeline until all jobs complete.
-When invoked with `--commit` (or defaulting to HEAD), it monitors both the GitLab
-pipeline **and** any GitHub Actions workflows for the same commit. GitHub monitoring
-requires `ddtool auth github login --org DataDog` to be configured; if the token is
-unavailable, a warning is printed and only GitLab is monitored.
-
-Results are written to `/tmp/gitlab_<pipeline_id>/`:
-- `success.txt` — `<job_id>\t<job_name>` per line
-- `failure.txt` — same format for failed jobs (GitLab and GitHub; GitHub entries are prefixed `[GH]`)
-- `fail_logs/<job_id>.log` — full job trace for each GitLab failure
-- `gh_fail_logs/gh_<job_id>.log` — log for each GitHub Actions job failure
-
-Exit codes: 0 = all passed, 1 = failures or threshold reached.
-
-#### Invocation pattern
-
-Available options: `--commit <ref>` OR `--pipeline <id>` (GitLab only, skips GitHub),
-`--discovery-timeout <s>` (default 60), `--poll-interval <s>` (default 60),
-`--max-failures <n>` (default 50), `--timeout <s>` (default 7200 = 2 h),
-`--list-jobs` (see below).
-
-##### `--list-jobs`
-
-Prints all jobs grouped by pipeline with their status, then exits
-immediately — does not monitor or download logs. Shows both GitLab pipelines
-and GitHub Actions workflow runs. Useful for a quick snapshot of what ran and
-what failed:
-
-```bash
-.claude/ci/check-ci --commit HEAD --list-jobs
-```
-
-Output format:
-
-```
-Pipeline 105413994 (status: failed):
-  failed    test_extension_ci: [7.2]
-  success   compile extension: debug [8.3]
-  ...
-
-GitHub Actions run 12345678 'Profiling ASAN Tests' (status: completed, conclusion: failure):
-  failure   prof-asan (8.5, ubuntu-8-core-latest)
-  success   prof-asan (8.3, ubuntu-8-core-latest)
-  ...
-```
-
-#### Monitor CI
-
-If --list-jobs is not passed, check-ci will run until all monitored pipelines
-finish, until a timeout, or until the maximum number of failures is reached.
-
-**Step 1 — Start check-ci in the background (Bash tool,
-`run_in_background: true`):**
-
-```bash
-PYTHONUNBUFFERED=1 .claude/ci/check-ci [OPTIONS]
-```
-
-Do NOT add `&` or `mktemp` — run the command directly and let
-`run_in_background: true` handle backgrounding. `PYTHONUNBUFFERED=1`
-is required so Python flushes stdout into the task output file.
-The Bash tool returns immediately with a line like:
-```
-Output is being written to: /path/to/tasks/<id>.output
-```
-Note that path — it is the output file for the next step.
-
-**Step 2 — Run ci-watch in the background (Bash tool,
-`run_in_background: true`):**
-
-```bash
-.claude/ci/ci-watch [--start-offset N] OUTPUT_FILE
-```
-
-**`OUTPUT_FILE` must be the output file from a `check-ci` process** — not an
-arbitrary background task. `ci-watch` parses `check-ci`'s structured
-`FAILED:` / `SUCCESS:` lines and exits silently on anything else.
-
-`ci-watch` tails the output file and exits when there is something to
-act on. Run it with `run_in_background: true` — you will be notified
-when it completes. While it runs, you can do other work.
-
-Exit codes:
-- 0 — all pipelines completed (no failures)
-- 1 — one or more FAILED: lines detected
-- 2 — stale: no new output for 5 minutes
-- 3 — check-ci timed out
-
-On exit, ci-watch always prints `RESUME_OFFSET: <N>`. Record this
-value — pass it as `--start-offset N` when re-running ci-watch to
-skip already-processed content and wait for further failures.
-
-When ci-watch completes, immediately call the `speak_when_done` MCP tool:
-- "All CI jobs passed" if exit 0.
-- "<N> CI jobs failed" if exit 1 (count is
-  `grep "^FAILED:" OUTPUT_FILE | wc -l`).
-- "CI monitor timed out" if exit 2 or 3.
-
-**Step 3 — Act on the result**
-
-Choose mong these actions, as appropriate:
-
-- **Just report:** summarise the result to the user and stop.
-- **Investigate failures:** read `fail_logs/<job_id>.log` under the
-  output directory for each failed job and diagnose the root cause.
-- **Wait for more failures:** if check-ci is still running and you want
-  to keep watching after investigating, re-run ci-watch with
-  `--start-offset <RESUME_OFFSET>` (back to Step 2).
-- **Kill check-ci:** if you want to stop monitoring entirely, kill it
-  by its task ID or PID (noted from Step 1).
-- **Push fixes**: if a) the user asked you to (NOT OTHERWISE), AND b)
-  you have made changes to fix the CI failures AND c) the current
-  branch has an upstream branch, then commit and push. Then go back to
-  Step 1. If any of the three preconditions don't match, stop and
-  report the results (and your findings, if any).
+Use the `/check-ci` skill — it encapsulates the full procedure: starting
+`check-ci` and `ci-watch` in the background, speaking the result, and
+investigating failures. See
+[`.claude/skills/check-ci/SKILL.md`](../skills/check-ci/SKILL.md).
+
+Quick reference for the underlying tools:
+
+- `check-ci` options: `--commit <ref>` OR `--pipeline <id>` (GitLab only,
+  skips GitHub), `--discovery-timeout <s>` (default 60),
+  `--poll-interval <s>` (default 60), `--max-failures <n>` (default 50),
+  `--timeout <s>` (default 7200 = 2 h), `--list-jobs`.
+- When `--commit` is used, both GitLab and GitHub Actions are monitored.
+  GitHub monitoring requires `ddtool auth github login --org DataDog`; if
+  unavailable, a warning is printed and only GitLab is monitored.
+- Results land in `/tmp/gitlab_<pipeline_id>/`: `success.txt`,
+  `failure.txt` (GitHub entries prefixed `[GH]`),
+  `fail_logs/<job_id>.log`, `gh_fail_logs/gh_<job_id>.log`.
+- `--list-jobs` prints a grouped job table (GitLab + GitHub Actions) and
+  exits immediately — useful for a quick snapshot without monitoring.
 
 ### Downloading artifacts
 
diff --git a/.claude/skills/check-ci/SKILL.md b/.claude/skills/check-ci/SKILL.md
new file mode 100644
index 00000000000..66389b92b5d
--- /dev/null
+++ b/.claude/skills/check-ci/SKILL.md
@@ -0,0 +1,164 @@
+---
+name: check-ci
+description: >-
+  Monitor GitLab CI and GitHub Actions for this repo: start check-ci, tail
+  results with ci-watch, investigate failures, and report. Use when the user
+  asks to check, watch, or monitor CI, or to see whether a pipeline passed.
+argument-hint: "[--commit <ref> | --pipeline <id>] [--list-jobs]"
+allowed-tools: Bash Read Grep Glob Agent TaskCreate TaskUpdate TaskStop mcp__speak_when_done__speak
+effort: high
+---
+
+# Check CI
+
+Monitor GitLab CI and GitHub Actions until all jobs finish, then investigate
+any failures and report results.
+
+When `--commit` is used (or defaulting to HEAD), both GitLab pipelines and
+GitHub Actions workflow runs are monitored. GitHub monitoring requires
+`ddtool auth github login --org DataDog`; if unavailable, a warning is
+printed and only GitLab is monitored. `--pipeline <id>` is GitLab-only and
+skips GitHub.
+
+## Input
+
+`$ARGUMENTS` may contain any combination of:
+- `--commit <ref>` — git ref to resolve (default: HEAD); monitors GitLab +
+  GitHub
+- `--pipeline <id>` — specific GitLab pipeline ID (skips GitHub monitoring)
+- `--list-jobs` — quick snapshot mode (no monitoring)
+
+If no `--commit` or `--pipeline` is given, default to `--commit HEAD`.
+
+## Quick mode — `--list-jobs`
+
+Run synchronously and exit immediately:
+
+```bash
+.claude/ci/check-ci --commit <ref> --list-jobs
+```
+
+Prints all jobs grouped by pipeline (GitLab) and workflow run (GitHub
+Actions) with their status. Print the table to the user and stop. Do not
+continue to the monitoring steps.
+
+## Full monitoring mode
+
+### Step 1 — Start check-ci in the background
+
+```bash
+PYTHONUNBUFFERED=1 .claude/ci/check-ci [OPTIONS]
+```
+
+- Use `run_in_background: true` in Bash tool invocation. Do NOT append `&` or
+  redirect output.
+- The Bash tool returns immediately with an output file path like
+  `/path/to/tasks/<id>.output` ("Output is being written to ..." in the tool
+  invocation output). Note this path — it is required in Step 2. This file path
+  will be referred to as `OUTPUT_FILE` henceforth.
+- Default options if the user provided none: `--commit HEAD`.
+- You may also pass `--max-failures 50` (default) and
+  `--timeout 7200` (default, 2 h).
+
+### Step 2 — Start ci-watch in the background
+
+```bash
+.claude/ci/ci-watch [--start-offset N] OUTPUT_FILE
+```
+
+- `OUTPUT_FILE` must be the output file from the check-ci task above.
+- Use `run_in_background: true`.
+- You are notified when ci-watch exits through a task notification. While it
+  runs, you may do other work.
+- On exit, ci-watch always prints `RESUME_OFFSET: <N>`. Record it for re-runs.
+
+ci-watch exit codes:
+| Code | Meaning |
+|------|---------|
+| 0 | All pipelines completed — no failures |
+| 1 | One or more `FAILED:` lines detected |
+| 2 | Stale — no new output for 5 minutes |
+| 3 | check-ci timed out |
+
+### Step 3 — Speak and act on the result
+
+**Immediately after ci-watch exits**, call
+`mcp__speak_when_done__speak(message="...")` (the first time, you'll need to do
+invoke `ToolSearch("select:mcp__speak_when_done__speak")`:
+- Exit 0 → "All CI jobs passed"
+- Exit 1 → "<N> CI jobs failed" (count with `grep "^FAILED:" OUTPUT_FILE | wc
+  -l`)
+- Exit 2 or 3 → "CI monitor timed out"
+
+Then choose the appropriate action:
+
+#### All jobs passed (exit 0)
+
+Report success to the user and stop.
+
+#### Failures detected (exit 1)
+
+1. List the failed jobs:
+   ```bash
+   grep "^FAILED:" OUTPUT_FILE
+   ```
+   The output directory is `/tmp/gitlab_<pipeline_id>/`. Logs are at:
+   - `fail_logs/<job_id>.log` — GitLab job traces
+   - `gh_fail_logs/gh_<job_id>.log` — GitHub Actions job logs
+   GitHub entries in `failure.txt` are prefixed `[GH]`.
+
+2. Read each failure log and diagnose the root cause. Look for:
+   - Compile errors or linker failures
+   - Test assertion failures (include the failing test name and diff)
+   - Infrastructure/flakiness signals (timeout, network, Docker pull failures,
+     OOM) — mark these as flaky rather than real failures.
+
+   Except you don't need to go through of them if it becomes evident it's
+   unnecessary.
+
+3. Report findings grouped by root cause.
+
+4. **Fix and push only when all three conditions hold:**
+   a. The user explicitly asked you to fix CI failures.
+   b. You have made changes to address the failures.
+   c. The current branch has an upstream remote branch.
+   If any condition is missing, stop and report instead.
+
+   When all three hold: commit the fix, push, then go back to Step 1
+   to re-monitor.
+
+   If possible, before attempting a fix, try to reproduce the failure locally.
+   Check @.claude/ci/index.md for instructions. Then attempt your fix and rerun
+   to confirm the fix resolves the problem.
+
+#### Stale or timed out (exit 2 or 3)
+
+Re-run ci-watch with `--start-offset <RESUME_OFFSET>` (Step 2) to
+resume watching from where you left off. If check-ci itself has also
+exited, restart from Step 1.
+
+#### Keep watching (user wants to continue after investigation)
+
+Re-run ci-watch with `--start-offset <RESUME_OFFSET>` (back to Step 2).
+
+## Downloading artifacts
+
+Use `tooling/bin/download-artifacts` to fetch build outputs from CI jobs
+(e.g., compiled extensions, SSI loader, datadog-setup.php). Useful when
+investigating a failure that produced an artifact worth inspecting locally.
+
+## Rules
+
+- Never push unless the user explicitly asked for it. See the global
+  instruction "Do not push to git remotes unless explicitly asked to."
+- Flaky jobs (known to be intermittent, unrelated to the current
+  changes) should be noted but not treated as real failures requiring
+  a fix. However, to confirm that a test is failure you should look for
+  similar failures in the merge base.
+- `GITLAB_PERSONAL_ACCESS_TOKEN` is already set in the environment —
+  do not re-export it.
+- Raw job logs can also be fetched directly. For Gitlab:
+  ```bash
+  curl -s -H "PRIVATE-TOKEN: $GITLAB_PERSONAL_ACCESS_TOKEN" \
+    "https://gitlab.ddbuild.io/api/v4/projects/355/jobs/<JOB_ID>/trace"
+  ```

From e3e161c7df1bd96b994b30d6293f3416e2d11743 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gustavo=20Andr=C3=A9=20dos=20Santos=20Lopes?=
 <gustavo.lopes@datadoghq.com>
Date: Tue, 9 Jun 2026 11:42:36 +0100
Subject: [PATCH 2/2] /crash-analysis: adapt to 1.8 schema

---
 .claude/skills/crash-analysis/SKILL.md | 102 ++++++++++++++++++++-----
 1 file changed, 85 insertions(+), 17 deletions(-)

diff --git a/.claude/skills/crash-analysis/SKILL.md b/.claude/skills/crash-analysis/SKILL.md
index 4d59ccdd447..cdb6703c78b 100644
--- a/.claude/skills/crash-analysis/SKILL.md
+++ b/.claude/skills/crash-analysis/SKILL.md
@@ -13,6 +13,9 @@ effort: max
 
 Systematically analyze a dd-trace-php crash report to identify the root cause.
 
+> **Schema note:** The event file you receive is the **backend-enriched** form,
+> not the one libdatadog emits.
+
 ## Input
 
 The user provides a crash event JSON file (path via `$ARGUMENTS`, or pasted
@@ -25,23 +28,58 @@ file. Do all extractions in parallel where possible.
 
 | Field | Command |
 |-------|---------|
-| Library version | `jq -c .metadata.library_version $EVENT` |
+| Library version | `jq -r '.tracer_version // (.library_version \| "\(.major).\(.minor).\(.patch)")' $EVENT` |
 | Signal | `jq -c .sig_info $EVENT` |
-| Native stacktrace | `jq -c .error.stack.frames $EVENT` |
+| Error summary | `jq -r '"\(.error.type // "") \(.error.message // "")"' $EVENT` |
+| Crash diagnosis | `jq -c .crash_diagnosis $EVENT` |
+| Native stacktrace | `jq -c '.error.stack.frames' $EVENT` |
 | PHP stacktrace | `jq -c .experimental.runtime_stack.frames $EVENT` |
 | Mapped files | `jq -c '.files["/proc/self/maps"]' $EVENT` |
 | Registers | `jq -c '.ucontext // .experimental.ucontext' $EVENT \| .claude/parse_ucontext.py` |
-| PHP version | `jq -r '.language_version // (.metadata.tags[] \| select(startswith("runtime_version:")) \| split(":")[1])' $EVENT` |
+| PHP version | `jq -r '.language_version // .runtime_version // (.metadata.tags[] \| select(startswith("runtime_version:")) \| split(":")[1])' $EVENT` |
+| OS / arch | `jq -r '(.os_info.os_type // .host.os) + " " + (.os_info.architecture // .host.arch // "unknown") + " (kernel " + (.host.version // "?") + ")"' $EVENT` |
 
-> **Note:** `parse_ucontext.py` only supports amd64. On aarch64, skip this step
-> and read register values directly from `.ucontext` (or `.experimental.ucontext`
-> in older events).
+> **Note:** `parse_ucontext.py` only supports amd64. Check `.ucontext.arch`
+> first; on aarch64, skip the script and read register values directly from
+> `.ucontext.raw`.
 
-From the mapped files, determine:
-- **Products loaded**: look for `ddtrace.so`, `ddappsec.so`, `datadog-profiling.so`
-- **SSI mode**: check for `libdatadog_php.so` and `dd_library_loader.so` — if present, the process is running the SSI (Single-Step Instrumentation) package. See [SSI architecture](#ssi-architecture) below.
-- **OS/arch**: architecture (x86_64 or aarch64)
-- **libc**: GNU (`ld-linux-x86-64.so`) or musl (`ld-musl-x86-64.so`)
+### crash_diagnosis (schema 1.8+)
+
+`crash_diagnosis` is computed **server-side by the Datadog errors-worker**
+(DataDog/dd-source,
+`domains/evp-workers/apps/errors-worker/src/crashtracking/`), not by
+libdatadog. It consumes `sig_info`, `ucontext.registers`, and `/proc/self/maps`
+from the event; if any is absent, the field is omitted. Use it to confirm (not
+skip!) manual triage steps:
+
+| Field | Meaning |
+|-------|---------|
+| `category` | Crash category — see enum below |
+| `summary` | One-line human-readable description |
+| `details` | Extended description with signal/address details and analysis rationale |
+| `crashLocation` | Optional. The memory mapping containing the instruction pointer at crash time |
+| `crashLocation.path` | Binary where the crashing instruction lives |
+| `crashLocation.offsetInMapping` | Offset within that binary's mapped region (hex) |
+| `crashLocation.permissions` | Mapping permissions (`r-xp` = executable code) |
+| `faultAddressMapped` | Optional. `true` = fault address is in a mapped region; `false` = not mapped (wild pointer); absent if `si_addr` unavailable |
+| `faultAddressMapping` | If `faultAddressMapped` is `true`, the mapping containing the fault address |
+| `nullRegisters` | Registers whose value was < 0x1000 (null page threshold) at crash time — these are the likely null pointer sources for a `NullPointerDereference` |
+| `stackPointerValid` | Optional. `false` = SP is outside the `[stack]` mapping; stack is corrupt; makes native stacktrace unreliable |
+
+#### DiagnosisCategory enum (complete)
+
+| Value | Signal | Condition |
+|-------|--------|-----------|
+| `NullPointerDereference` | SIGSEGV/SEGV_MAPERR | fault addr < 0x1000 (null page) |
+| `StackOverflow` | SIGSEGV/SEGV_MAPERR | fault addr within 8 KB of stack guard page |
+| `UseAfterFree` | SIGSEGV/SEGV_MAPERR | fault addr within 1 MB past heap end |
+| `WildPointer` | SIGSEGV/SEGV_MAPERR | unmapped address, no recognizable pattern |
+| `WriteToReadOnly` | SIGSEGV/SEGV_ACCERR | faulting mapping is non-writable |
+| `ExecuteNonExecutable` | SIGSEGV/SEGV_ACCERR or SIGILL | fault addr == IP and mapping is non-executable |
+| `MisalignedAccess` | SIGBUS/BUS_ADRALN | misaligned memory access (BUS_ADRALN only — BUS_ADRERR, e.g. file-mapped access beyond EOF, maps to `Unknown`) |
+| `IllegalInstruction` | SIGILL | invalid opcode in executable region |
+| `IntentionalAbort` | SIGABRT | assert(), panic!(), or allocator corruption |
+| `Unknown` | any | no pattern matched |
 
 ### SSI architecture
 
@@ -76,16 +114,17 @@ crash and understand context:
 | `profiler_unwinding` | `0` | `counters.rs` | Nonzero = profiler was unwinding the stack at crash time. |
 | `profiler_serializing` | `0` | `counters.rs` | Nonzero = profiler was serializing data at crash time. |
 | `si_signo` | `11` | `sig_info.rs` | Raw signal number (`11` = `SIGSEGV`). |
-| `si_signo_human_readable` | `sigsegv` | `sig_info.rs` | Signal name (`SIGSEGV`, `SIGBUS`, `SIGILL`, `SIGFPE`, …). Older versions may be lowercase. |
+| `si_signo_human_readable` | `SIGSEGV` | `sig_info.rs` | Signal name (`SIGSEGV`, `SIGBUS`, `SIGILL`, `SIGFPE`, …). Always uppercase. |
 | `si_code` | `1` | `sig_info.rs` | Raw signal code; meaning is signal-dependent. |
-| `si_code_human_readable` | `segv_maperr` | `sig_info.rs` | Signal code name (`SEGV_MAPERR`, `SEGV_ACCERR`, `BUS_ADRALN`, `ILL_ILLOPC`, …). |
+| `si_code_human_readable` | `SEGV_MAPERR` | `sig_info.rs` | Signal code name (`SEGV_MAPERR`, `SEGV_ACCERR`, `BUS_ADRALN`, `ILL_ILLOPC`, …). |
 | `si_addr` | `0x00007ff894af86c8` | `sig_info.rs` | Fault address from `siginfo_t.si_addr`. |
 | `is_crash` | `true` | `errors_intake.rs` / `sidecar.c` | Always `true` for crash reports. |
 | `incomplete` | `false` | `errors_intake.rs` | `true` = stack trace is truncated / could not fully unwind. |
-| `data_schema_version` | `1.4` | `errors_intake.rs` | JSON schema version; current is `1.5`. |
+| `language` | `php` | `sidecar.c` | Language identifier pushed as `language:php`. |
+| `runtime` | `php` | `sidecar.c` | Runtime identifier pushed as `runtime:php`. |
+| `data_schema_version` | `1.8` | `errors_intake.rs` | JSON schema version; current is `1.8`. |
 | `uuid` | `2f530826-…` | `errors_intake.rs` | RFC 4122 UUID shared between crash ping and crash report. |
 | `version` | `1.16.0` | `sidecar.c` | Service version from `DD_VERSION` or the active APM span. |
-| `source` | `php` | `sidecar.c` | Language/runtime identifier (`"php"`). |
 | `team` | `telemetry-and-analytics` | Datadog backend | Internal routing tag injected by the intake pipeline. Not from PHP code. |
 | `instrumented_service` | `web.request` | Datadog Agent/backend | Resource/span type at crash time. Not from PHP code. |
 | `datacenter` | `us1.prod.dog` | Datadog backend | Intake datacenter/region tag. Not from PHP code. |
@@ -96,7 +135,17 @@ Check whether any profiler counter (`profiler_collecting_sample`,
 `profiler_unwinding`, `profiler_serializing`) is nonzero — this attributes the
 crash to profiler activity.
 
-Print the triage summary before continuing.
+From the mapped files, determine:
+- **Products loaded**: look for `ddtrace.so`, `ddappsec.so`,
+  `datadog-profiling.so`
+- **SSI mode**: check for `libdatadog_php.so` and `dd_library_loader.so` — if
+  present, the process is running the SSI (Single-Step Instrumentation)
+  package. See [SSI architecture](#ssi-architecture) below.
+- **OS/arch**: prefer `os_info.architecture` (schema 1.8+) over `host.arch`
+  (which may be empty), but fall back to reading the mapped ld-linux file name
+- **libc**: GNU (`ld-linux-x86-64.so`) or musl (`ld-musl-x86-64.so`)
+
+Finally, print the triage summary before continuing.
 
 ## Phase 2 — Stacktrace correlation
 
@@ -104,6 +153,12 @@ Checkout the matching version tag in a worktree (tags are like `1.16.0`).
 For PHP source, use the `php-src` repository next to this checkout; PHP tags
 are like `PHP-8.1.33`.
 
+PHP runtime frames (`experimental.runtime_stack.frames`, format: `"Datadog
+Runtime Callback 1.0"`) contain:
+- `file` / `function` / `line` — source location
+- `type_name` — class name when the frame is a method call (e.g.
+  `"Couchbase\\Collection"`)
+
 > **Note:** Ondřej Surý packages for Debian may be slightly modified relative to
 > upstream PHP. If discrepancies appear, use `apt-get source` inside an
 > appropriate Docker container to obtain the exact source.
@@ -132,6 +187,18 @@ If frames land in unknown binaries, note them but focus on Datadog frames first.
 **If you can identify the root cause at this point, stop and report.** Only
 continue to Phase 3/4 if the analysis is ambiguous or low-confidence.
 
+Note: the authoritative native stacktrace for the crashing thread is
+**`.error.stack.frames`** (format: `"Datadog Crashtracker 1.0"`, always
+populated when a stack could be captured). The crashing thread name is in
+`.error.thread_name`.
+
+`error.threads` is a per-thread snapshot array present in schema 1.8+. Each
+element carries a `crashed` boolean flag, `name`, `state`, and `stack.{frames,
+incomplete}`. In practice, `crashed` is often `false` on every thread and
+`stack.frames` is `null` with `incomplete: true` — the per-thread stacks are
+frequently unavailable. Use them as supplementary context only; do not rely on
+them as the primary frame source.
+
 ## Phase 3 — Binary verification (if needed)
 
 If the stacktrace correlation is ambiguous or the crash is in Datadog code:
@@ -148,7 +215,8 @@ If the stacktrace correlation is ambiguous or the crash is in Datadog code:
      .claude/dd_php_release_url '<version>' '<php_minor>' '<arch>' '<gnu|musl>'
      ```
    Both print a temp directory with the extracted package. Use the version
-   exactly as it appears in `metadata.library_version`.
+   exactly as it appears in `tracer_version` (or reconstructed from
+   `library_version`).
 
 2. Verify the binary matches the crash by comparing:
    - Size of first mapped region (from `/proc/self/maps`) vs. `p_memsz` of the