Skip to content

Add clocks and cpubench subcommands; fix #161 PLL decode#162

Merged
widgetii merged 1 commit into
masterfrom
clocks-cpubench-issue-161
May 14, 2026
Merged

Add clocks and cpubench subcommands; fix #161 PLL decode#162
widgetii merged 1 commit into
masterfrom
clocks-cpubench-issue-161

Conversation

@widgetii
Copy link
Copy Markdown
Member

Summary

  • Adds ipctool clocks (alias freq) — reports CPU PLL, DDR clock, and per-die HPM characterization of the running SoC as YAML (default) or JSON. Also surfaces as a top-level clocks: section in the default ipctool no-arg survey.
  • Adds ipctool cpubench — multi-pattern inline-asm CPU clock triangulation. Three Cortex-A7-tuned patterns (dep_add, indep_add, dep_mul) cross-validate the register-based clock decode. Kept as a battle-proven test for future board bring-up when a PLL register decode is suspect.
  • Fixes the PLL register identification asserted in Show current CPU clock, DDR clock, and per-die HPM characterization #161: 0x12010014 is not the CPU PLL FBDIV; the real CPU PLL (APLL) is the register pair (0x12010000, 0x12010004) with multi-field decode (FRACDIV/POSTDIV1/POSTDIV2 + FBDIV/REFDIV). All current V4 boards run at 900 MHz.

Closes #161.

Falsifying the issue's register interpretation

The issue body identifies 0x12010014 as "CPU PLL FBDIV" with values 0x01770000 → 952 MHz (Board A) vs 0x018F0000 → 1144 MHz (Board B), attributed to per-die HPM-driven mask-ROM binning.

Bench on three V4 lab boards with three different values at 0x12010014:

Board 0x12010014 raw cpubench median MHz cpubench dep_mul MHz
hi3516ev300 OpenIPC 0x00000000 882 900
gk7205v300 OpenIPC 0x01770000 (claimed 952) 874 892
gk7205v300 XM Sofia 0x018F0000 (claimed 1144) 844* 863*

* XM has unkillable kernel/userspace background load that drags estimates ~5% low.

All three boards run at the same CPU clock regardless of the 0x12010014 value, so it cannot be the live CPU PLL FBDIV. The value at 0x12010014 is mask-ROM-written and correlates with per-die HPM binning, but doesn't drive any active clock.

What 0x12010004 is

The Hi3516A SDK kernel patch (Hi3516EV200_SDK_V1.0.1.2/osdrv/opensource/kernel/linux-4.9.37.patch) defines struct hi3516a_pll_clock for APLL as a register pair (0x0, 0x4):

ctrl_reg1 = 0x00:  FRACDIV[23:0]   POSTDIV1[26:24]  POSTDIV2[30:28]
ctrl_reg2 = 0x04:  FBDIV[11:0]     REFDIV[17:12]
f = 24 MHz * FBDIV / (REFDIV * POSTDIV1 * POSTDIV2)

The V4 SDK kernel patch itself defines no PLL struct (clk-hi3516ev300.c declares only fixed-rate and mux/gate clocks — the V4 Linux clock driver treats CPU clock as fixed-rate post-boot). Applying the Hi3516A layout to V4's actual reads (0x12010000, 0x12010004) = (0x12000000, 0x0100104B) gives FBDIV=75, REFDIV=1, POSTDIV1=2, POSTDIV2=1 → 900 MHz, within 0.2% of the cpubench triangulation.

Cross-checked: both u-boot-hi3516ev200/reg_info_hi3516ev300.bin and u-boot-gk7205v200/reg_info_gk7205v300.bin write only 0x12010080 (DDR cksel). Neither writes the APLL register pair — the mask ROM is the sole writer and the value is identical across all V4 boards (HiSi and Goke brandings).

V4 family register map shipped in this PR

Field Address Decode
cpu_pll (APLL) (0x12010000, 0x12010004) 5-field multi-register
ddr cksel 0x12010080 bits [5:3] mux: 24 / 450 / 300 / 297 MHz (DDR3 ×4)
hpm 0x1202015C bits [25:16] value + binning thresholds 190/310
hpm.aux (HPM_CORE_REG0) 0x120280D8 per-die fingerprint
pll_shadow_0c (raw) 0x1201000C "not a live FBDIV" diagnostic
pll_shadow_14 (raw) 0x12010014 "not a live FBDIV" diagnostic

The two pll_shadow_* entries are kept so users can still spot per-die HPM correlations in the field, but with an explicit note that they don't drive any clock.

Sample output (gk7205v300 OpenIPC)

clocks:
  family: Hisilicon V4 / Goke V300
  cpu_pll:
    ctrl_reg1: 0x12010000
    ctrl_reg1_raw: 0x12000000
    ctrl_reg2: 0x12010004
    ctrl_reg2_raw: 0x0100104b
    fbdiv: 75
    refdiv: 1
    postdiv1: 2
    postdiv2: 1
    fracdiv: 0x000000
    freq_mhz: 900
  ddr:
    reg: 0x12010080
    cksel: 1
    freq_mhz: 450
    data_rate_mbps: 1800
  hpm:
    value: 241
    bin: mid
    binning_window: [190, 310]
    aux_reg: 0x120280d8
    aux_value: 0x81020103
  pll_shadow_14:
    reg: 0x12010014
    raw: 0x01770000
    note: mask-ROM HPM-bin shadow; not a live FBDIV (see clocks_v4.c)
cpubench:
  patterns:
    dep_add:    {ops_per_sec_M: 736.1, cycles_per_op: 1.1875, freq_mhz: 874.1, model: "A7: 1 cyc/op dep ALU"}
    indep_add:  {ops_per_sec_M: 1398.4, cycles_per_op: 0.594, freq_mhz: 830.3, model: "A7: 0.5 cyc/op dual-issue"}
    dep_mul:    {ops_per_sec_M: 279.8, cycles_per_op: 3.1875, freq_mhz: 891.8, model: "A7: 3-cyc latency dep MUL"}
  consensus_freq_mhz: 874  # median, robust to indep_add under-shoot
  spread_pct: 7.0

Test plan

  • Cross-build with arm-openipc-linux-musleabi (CI baseline, hi3516cv100 toolchain) + bare UPX — no warnings under -Wextra -O0/-Os.
  • ipctool clocks on hi3516ev300 OpenIPC → 900 MHz, shadow regs both zero (mask ROM didn't write them on the HiSi brand).
  • ipctool clocks on gk7205v300 OpenIPC → 900 MHz, shadow regs 0x01770000/0x018F0000 (HPM-bin shadow only, properly labeled).
  • ipctool clocks on gk7205v300 XM Sofia → 900 MHz, shadow regs 0x018F0000/0x01970000.
  • ipctool no-arg → clocks: section appears alongside chip:, ram:, sensors:, etc.
  • ipctool cpubench on all three boards → median 844-882 MHz, dep_mul 863-900 MHz (within model noise of 900).
  • ipctool clocks --json and ipctool cpubench --json → valid JSON via jq.
  • ipctool freq alias works.

Out of scope (TODOs in code comments)

  • V1/V2/V3/V3A/V4A/OT/3536C/3536D family tables (the table-driven structure makes each a pure-data addition).
  • DDR type auto-detect from DDRC mode registers at 0x120d0000 (currently assumes DDR3 quad-pumped via rate_mult = 4).
  • A53 cluster PLL, ETH PLL on V4 once their register layouts are verified.
  • Per-CPU scaling_cur_freq for SMP variants (cpu0 only).

🤖 Generated with Claude Code

Closes #161.

The new `clocks` subcommand exposes the running SoC's CPU PLL, DDR
clock, and per-die HPM (Hardware Performance Monitor) characterization
as YAML (default) or JSON (`--json`). Output is also wired into the
default `ipctool` survey as a top-level `clocks:` section.

The decode is table-driven (`struct pll_info`, `struct mux_info`,
`struct hpm_info`, `struct raw_reg_info`) so future SoC families can
be added as pure data next to `src/hal/hisi/clocks_v4.c`.

V4 family register map, verified register-for-register on three lab
boards (hi3516ev300 OpenIPC, gk7205v300 OpenIPC, gk7205v300 XM Sofia):

  CPU PLL (APLL)  register pair (0x12010000, 0x12010004)
                  ctrl_reg1 [23:0]  FRACDIV
                            [26:24] POSTDIV1
                            [30:28] POSTDIV2
                  ctrl_reg2 [11:0]  FBDIV
                            [17:12] REFDIV
                  f = 24 MHz * FBDIV / (REFDIV * POSTDIV1 * POSTDIV2)
  DDR cksel mux   0x12010080 bits [5:3]  (001->450 MHz, DDR3 x4)
  HPM             0x1202015c + 0x120280d8 aux fingerprint

APLL bit layout borrowed from `struct hi3516a_pll_clock` in the
HiSilicon SDK kernel patch (Hi3516EV200_SDK_V1.0.1.2 / linux-4.9.37
patch). The V4 SDK kernel patch itself defines no PLL struct -- V4's
kernel clock driver treats CPU clock as fixed-rate post-boot, so the
mask ROM is the only writer; both vendor u-boots leave the APLL
register pair untouched (only 0x12010080 DDR cksel is written by
both, confirmed via xxd of the reg_info_*.bin tables).

Issue #161 originally identified `0x12010014` as "CPU PLL FBDIV"
with `0x01770000 -> 952 MHz` (Board A) vs `0x018F0000 -> 1144 MHz`
(Board B). Bench on three boards with three different values at
`0x12010014` (`0x00000000`, `0x01770000`, `0x018F0000`) yields
identical CPU clock to within 3% noise -- the value at `0x14` is a
mask-ROM-written diagnostic that correlates with per-die HPM binning
but does NOT drive any active clock. This commit exposes `0x12010014`
and `0x1201000c` as `pll_shadow_*` raw entries (still useful for
fleet silicon-binning comparisons) with a "not a live FBDIV" note.

The multi-pattern triangulation that settled the question is now
shipped as `ipctool cpubench` -- runs three inline-asm patterns with
known Cortex-A7 throughput (dep_add 1 cyc/op, indep_add 0.5 cyc/op
dual-issue, dep_mul 3-cyc latency) and back-calculates the clock
from each, reporting median consensus and spread. Useful for future
board bring-up when a PLL register decode is suspect.

End-to-end on all three lab boards:
  hi3516ev300 OpenIPC:   cpu_pll.freq_mhz=900, cpubench median 882
  gk7205v300 OpenIPC:    cpu_pll.freq_mhz=900, cpubench median 874
  gk7205v300 XM Sofia:   cpu_pll.freq_mhz=900, cpubench median 844*

  * XM has unkillable kernel/userspace background load that drags
    bench down ~5%; dep_mul (most reliable) still hits 863 MHz.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii merged commit bc4151d into master May 14, 2026
3 checks passed
@widgetii widgetii deleted the clocks-cpubench-issue-161 branch May 14, 2026 12:46
widgetii added a commit that referenced this pull request May 14, 2026
PR review feedback (#163): the YAML emitted by the no-arg `ipctool`
survey was too verbose for a daily-driver summary, and the previous
commit's `note:` fields + `family:` line added noise without helping
anyone read the output.

Changes:

- `clocks_build_json()` gains a `bool brief` parameter.
  - `brief = true`  (used by `ipctool` no-arg survey): emits only
    `cpu_pll.freq_mhz`, `ddr.data_rate_mbps`, `hpm.bin`.
  - `brief = false` (used by `ipctool clocks`): emits the full
    register-by-register YAML the subcommand has always produced.
- Remove the `family:` line; the chip name is already in `chip.model`
  one level up.
- Remove every `note:` field. They were either workarounds for
  "we haven't fully decoded this register" or restated info already
  obvious from sibling fields.
- Remove `pll_shadow_0c/14` (#162) and `ddr_pll/eth_pll/video_pll/
  pll_lock_status` (this PR's earlier commit). Their FBDIV bit layout
  on V4 differs from APLL (field-shaped bytes sit at [23:16] rather
  than [11:0]) and the REFDIV/POSTDIV positions aren't confirmed --
  shipping the raw words with a "decode TBD" note was noise. Per
  review: decypher or remove totally -- removed until a DDR-throughput
  probe and ethernet PHY rate cross-check anchor the layout.
- Keep the lock-bit signal that IS reliable: APLL lock from
  PERI_CRG_PLL122 bit 0. Expose it as `cpu_pll.locked: true/false` in
  the full output (bit 0 = APLL on V4 confirmed across all three lab
  boards; other bits' assignment on V4 still TBD).

Survey output is now (Goke V300 example):

  clocks:
    cpu_pll:
      freq_mhz: 900
    ddr:
      data_rate_mbps: 1800
    hpm:
      bin: mid

Verified end-to-end on all three lab boards (hi3516ev300 OpenIPC,
gk7205v300 OpenIPC, gk7205v300 XM Sofia): identical 8-line `clocks:`
section in survey mode; full `ipctool clocks` output unchanged
except for the dropped notes/family and the added cpu_pll.locked.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
widgetii added a commit that referenced this pull request May 14, 2026
* clocks: rename pll_shadow_* to ddr_pll/eth_pll, add video_pll + lock status

Follow-up to #162. The `pll_shadow_0c` / `pll_shadow_14` entries shipped
there were honest empirical observations ("vary by per-die HPM, don't
drive CPU clock") but used misleading labels: those registers aren't
shadows, they are real PLL configuration pairs for non-CPU domains.

Source: the V4A / HI3516CV500 mask-ROM reverse engineering at
github.com/widgetii/HI3516CV500-SDK/blob/master/sdk/bootrom/bootrom-re/
regmap-crg.h names them explicitly:

  APLL_CONFIG_0/1 = CRG_BASE + 0x00/0x04   CPU PLL
  DPLL_CONFIG_0/1 = CRG_BASE + 0x08/0x0C   DDR PLL
  EPLL_CONFIG_0/1 = CRG_BASE + 0x10/0x14   Ethernet PLL
  VPLL_CONFIG_0/1 = CRG_BASE + 0x18/0x1C   Video PLL
  PLL_LOCK_STAT   = CRG_BASE + 0x1E8       lock status

V4A is one generation up from V4 but shares the CRG layout (same APLL
definition recurs in the Hi3516A SDK kernel patch and was empirically
validated on V4 in #162), so the per-PLL labelling carries over.

Changes:
- `struct raw_reg_info` gains an optional `reg2` field so a single entry
  can dump both CONFIG_0 and CONFIG_1 of a PLL pair.
- Rename `pll_shadow_0c` -> `ddr_pll`, `pll_shadow_14` -> `eth_pll`.
  Add `video_pll` at (0x18, 0x1C) which was previously unexposed.
- Add `pll_lock_status` reading PERI_CRG_PLL122 (0x120101E8) raw, with
  a note explaining the V4A bootrom polls `0xB` (bits 0/1/3 for
  APLL/DPLL/EPLL) but V4 reads `0x5` (bits 0/2) on all three lab
  boards -- so V4's per-bit assignment likely differs from V4A's;
  ship the raw value, leave per-bit decode as a TODO.

FBDIV decode for DPLL/EPLL/VPLL is also TODO: empirically the
FBDIV-shaped bytes sit at bits [23:16] (0x8F/0x97/0x77 in the field
data) rather than [11:0] like APLL, so the layout differs. Need a
DDR-throughput probe and ethernet PHY rate cross-check before
shipping a frequency calculation. The raw config pair is dumped
verbatim so users can fleet-compare without misreading.

Verified on three lab boards: cpu_pll.freq_mhz still 900 (unchanged),
new fields render as expected, no regressions in ddr / hpm /
cpu_running / cpubench outputs. The hi3516ev300 (HiSi-branded)
firmware reads zero for both DPLL and EPLL config pairs while DDR
still runs at 450 MHz, suggesting the HiSi firmware leaves those
PLLs gated and derives DDR clock differently; the Goke firmware
actively programs DPLL and EPLL. Out of scope for this PR.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* clocks: brief survey output, drop family/notes/non-decoded entries

PR review feedback (#163): the YAML emitted by the no-arg `ipctool`
survey was too verbose for a daily-driver summary, and the previous
commit's `note:` fields + `family:` line added noise without helping
anyone read the output.

Changes:

- `clocks_build_json()` gains a `bool brief` parameter.
  - `brief = true`  (used by `ipctool` no-arg survey): emits only
    `cpu_pll.freq_mhz`, `ddr.data_rate_mbps`, `hpm.bin`.
  - `brief = false` (used by `ipctool clocks`): emits the full
    register-by-register YAML the subcommand has always produced.
- Remove the `family:` line; the chip name is already in `chip.model`
  one level up.
- Remove every `note:` field. They were either workarounds for
  "we haven't fully decoded this register" or restated info already
  obvious from sibling fields.
- Remove `pll_shadow_0c/14` (#162) and `ddr_pll/eth_pll/video_pll/
  pll_lock_status` (this PR's earlier commit). Their FBDIV bit layout
  on V4 differs from APLL (field-shaped bytes sit at [23:16] rather
  than [11:0]) and the REFDIV/POSTDIV positions aren't confirmed --
  shipping the raw words with a "decode TBD" note was noise. Per
  review: decypher or remove totally -- removed until a DDR-throughput
  probe and ethernet PHY rate cross-check anchor the layout.
- Keep the lock-bit signal that IS reliable: APLL lock from
  PERI_CRG_PLL122 bit 0. Expose it as `cpu_pll.locked: true/false` in
  the full output (bit 0 = APLL on V4 confirmed across all three lab
  boards; other bits' assignment on V4 still TBD).

Survey output is now (Goke V300 example):

  clocks:
    cpu_pll:
      freq_mhz: 900
    ddr:
      data_rate_mbps: 1800
    hpm:
      bin: mid

Verified end-to-end on all three lab boards (hi3516ev300 OpenIPC,
gk7205v300 OpenIPC, gk7205v300 XM Sofia): identical 8-line `clocks:`
section in survey mode; full `ipctool clocks` output unchanged
except for the dropped notes/family and the added cpu_pll.locked.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
widgetii added a commit that referenced this pull request May 14, 2026
…165)

Closes #160.

`ipctool membw` runs three synthetic memory-bandwidth ops against
large anonymous DDR buffers (mmap of /dev/zero, NOT malloc) and
reports MB/s:

  write : memset over the buffer       (W-only, libc-dependent)
  read  : volatile uint32_t sum loop   (R-only, libc-INdependent
                                        -- most trustworthy for
                                        cross-firmware comparison)
  copy  : memcpy between two buffers   (R+W, counted as 2x bytes)

CLI matches the existing clocks/cpubench shape:
  --size MB      buffer size per pass (default: 16; must exceed L2)
  --iters N      passes per op        (default: 16)
  --ops a,b,c    comma list of write,read,copy (default: all)
  --json         JSON output instead of YAML

Output is YAML by default with a `chip:` tag for context:

  membw:
    buffer_mb: 16
    iters: 16
    results:
      write:
        mb_per_sec: 2243
        duration_s: 0.120
      read:
        mb_per_sec: 421
        duration_s: 0.637
      copy:
        mb_per_sec: 1863
        duration_s: 0.288
    chip: hi3516ev300

Use case (from #161 / #162 debugging): when two boards with the
same SoC behave differently, this separates "CPU pipeline is the
bottleneck" from "DDR pipeline is the bottleneck" in a few seconds.
With APLL decode and HPM bin now in `ipctool clocks` from #162-#164,
this PR closes the third leg of the same investigation flow.

Verified on four lab boards (all with majestic / vendor App stopped
to measure DDR config baseline rather than workload):

  hi3516ev300 (V4, OpenIPC):   write 2243   read 421   copy 1863  MB/s
  gk7205v300  (V4, OpenIPC):   write 2096   read 417   copy 1633  MB/s
  gk7205v300  (V4, XM Sofia):  write 1576   read 370   copy 1302  MB/s  [--size 4]
  hi3516av300 (V4A, OpenIPC):  write 2320   read 427   copy 2440  MB/s

XM Sofia ran with --size 4 because the board has only 48 MB
userspace memory (the rest is mmz_anonymous for the encoder), so
the default 32 MB total (2 x 16 MB buffers) doesn't fit -- confirms
--size is a genuine knob, not just a tunable.

Buffer-via-mmap caveat baked in per the issue: anonymous DDR pages
rather than tmpfs / page cache.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
widgetii added a commit that referenced this pull request May 14, 2026
Replaces the build-toolchain-scp-dump dance from the #161/#162 work
with a single `ipctool bootrom` call. Found during issue #160 follow-up
when comparing mask-ROM accessibility across Goke V4 / HiSi V4 / V4A.

Default (no flags): YAML metadata -- base, size, accessibility,
first 16 bytes hex, chip name.

  bootrom:
    base: 0x04000000
    size: 65536
    accessible: 1
    first_bytes: 10 00 00 ea 14 f0 9f e5 14 f0 9f e5 14 f0 9f e5
    chip: hi3516av300

`accessible: 0` means the region reads as all zeros -- the Goke V4
case where the silicon hides the bootrom from userspace post-boot
(see the corresponding kaeru lesson). For HiSi V4 / V4A boards the
flag reliably reports 1.

`--dump` writes the raw binary region to stdout (intended for
redirect/pipe). No metadata preamble. Equivalent to the stand-alone
dump_bootrom binary used in the earlier diagnostic but built into
ipctool so no separate toolchain is needed.

Per-family defaults sourced from the V4A mask-ROM RE
(HI3516CV500-SDK/sdk/bootrom/bootrom.cpp: `BOOTROM = 0x04000000`,
SRAM = 0x04010000, so 64 KB max). Currently shipping defaults for
HISI_V4 (0x3516E300) and HISI_V4A (0x3516C500); other families
require explicit --base / --size.

Cross-board MD5 of `ipctool bootrom --dump | md5sum` reproduces the
stand-alone dumper results exactly:

  hi3516ev300 (V4 HiSi):    669081159f4fed97ad4db0bfd5251dca  (real ROM)
  gk7205v300  (V4 Goke):    fcd6bcb56c1689fcef28b57c22475bad  (all zeros)
  hi3516av300 (V4A):        1c2198a0a30c8258f5ab0c27aa523ff9  (real ROM,
                                                              different mask)

The `accessible` heuristic (all-zero buffer detector) correctly
flags the Goke V4 case as `0` and the two real-ROM cases as `1`.

CLI matches the existing clocks/cpubench/membw shape:

  ipctool bootrom [--base ADDR] [--size N] [--dump] [--json]

Note: --base/--size accept any address; reading non-bootrom MMIO
regions byte-by-byte (e.g. CRG at 0x12010000) will SIGBUS as the
kernel signals an unaligned/word-only access. That's appropriate
behavior -- the defaults are the supported path. Documented in
the --help output.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show current CPU clock, DDR clock, and per-die HPM characterization

1 participant