Skip to content

clocks: add V4A family (CV500 / AV300) — 4 PLLs + HPM#164

Merged
widgetii merged 1 commit into
masterfrom
clocks-v4a-family
May 14, 2026
Merged

clocks: add V4A family (CV500 / AV300) — 4 PLLs + HPM#164
widgetii merged 1 commit into
masterfrom
clocks-v4a-family

Conversation

@widgetii
Copy link
Copy Markdown
Member

Adds a clocks_family_v4a table covering Hi3516CV500 / Hi3516AV300 / Hi3516DV300 etc. (chip_generation = HISI_V4A = 0x3516C500).

Sources

Register map sourced first-party from:

What's decoded

Section V4 V4A (this PR)
APLL pair (CRG+0x00, CRG+0x04) same
DPLL pair dumped raw only (#163 dropped) (CRG+0x08, CRG+0x0C), fully decoded
EPLL pair dumped raw only (#163 dropped) (CRG+0x10, CRG+0x14), fully decoded
VPLL pair dumped raw only (#163 dropped) (CRG+0x18, CRG+0x1C), fully decoded
PLL lock bit (CPU) PERI_CRG_PLL122 bit 0 same
PLL lock bits (DPLL/EPLL/VPLL) not decoded bits 1 / 3 / 2
HPM_CHECK_REG 0x1202015C (10-bit) 0x12020098 (9-bit)
HPM aux fingerprint 0x120280D8 0x120300D8
DDR cksel mux 0x12010080 + cksel table not decoded -- bit positions not in the RE; skipped per "decypher or remove" rule from #163

All four PLLs on V4A use the same Hi3516A APLL bit layout (FBDIV at ctrl_reg2[11:0]), unlike V4 where DPLL/EPLL had a different (still-unverified) layout. This is why V4A can decode all four cleanly while V4 still can't.

Brief survey shape

V4A's brief block falls back to 2 headline numbers (CPU clock + HPM bin), since DDR cksel isn't decoded yet:

clocks:
  cpu_pll:
    freq_mhz: 900
  hpm:
    bin: mid

V4 boards still get the same 8-line brief block as in #163 (cpu_pll + ddr.data_rate_mbps + hpm.bin).

Brief-mode tightening (V4A motivated)

The brief-PLL emitter now ships only the first PLL of the family (by convention the CPU/APLL). DPLL/EPLL/VPLL on V4A are detail and only appear in the full ipctool clocks output. V4 brief is unchanged (V4 only has one PLL entry anyway).

Sample full output (openipc-hi3516av300)

clocks:
  cpu_pll:    freq_mhz: 900   locked: true   (FBDIV=75  POSTDIV1=2  POSTDIV2=1)
  ddr_pll:    freq_mhz: 594   locked: true   (FBDIV=99  POSTDIV1=4  POSTDIV2=1)
  eth_pll:    freq_mhz: 528   locked: true   (FBDIV=44  POSTDIV1=2  POSTDIV2=1)
  video_pll:  freq_mhz: 552   locked: true   (FBDIV=46  POSTDIV1=2  POSTDIV2=1)
  hpm:        value: 260      bin: mid       aux_value: 0x81090109

Test plan

  • Verified on openipc-hi3516av300.dlab.doty.ru (dual-core Cortex-A7, SMP, Linux 4.9.37):
    • brief: 5 lines after clocks: (cpu_pll.freq_mhz + hpm.bin)
    • full: APLL=900 / DPLL=594 / EPLL=528 / VPLL=552 MHz, all locked: true; HPM value=260 (mid bin), aux=0x81090109
    • ipctool cpubench median = 899 MHz (within 0.1% of register-decoded APLL)
  • V4 boards (hi3516ev300 OpenIPC + gk7205v300 OpenIPC) re-verified — same 8-line brief block as merged in clocks: brief survey output, decode APLL lock, drop noisy notes #163, no regression.
  • CI baseline build (cv100 toolchain + bare UPX) — no warnings under -Wextra.

Out of scope (TODOs in code comments)

  • DDR cksel mux on V4A — needs a board where DDR rate is independently known to anchor the cksel field.
  • Confirm the inferred VPLL=bit-2 lock-bit assignment (V4A bootrom RE only references bits 0/1/3 explicitly; bit 2 is inferred from 0x0F readback meaning all four are locked).
  • DPLL/EPLL FBDIV decode on V4 (their bit layout differs from V4A; still unanchored).

🤖 Generated with Claude Code

Adds a `clocks_family_v4a` table covering Hi3516CV500 / Hi3516AV300 /
Hi3516DV300 etc. (chip_generation = HISI_V4A = 0x3516C500).

Register map sourced first-party from the V4A mask-ROM reverse
engineering at widgetii/HI3516CV500-SDK sdk/bootrom/bootrom-re/
regmap-crg.h (CRG offsets) and the V4A SDK
HI3516CV500-SDK/sdk/fastburn/fastboot/Source/ddr_training.c (HPM
register addresses + bit layout).

Decoded on V4A:
- APLL/DPLL/EPLL/VPLL register pairs at (CRG+0x00..0x1C). All four use
  the same Hi3516A APLL bit layout (FBDIV at ctrl_reg2[11:0], REFDIV
  at ctrl_reg2[17:12], POSTDIV1/POSTDIV2 at ctrl_reg1[26:24]/[30:28]),
  unlike V4 where DPLL/EPLL had a different (still-unverified) layout.
- Per-PLL lock state from PERI_CRG_PLL122 (CRG+0x1E8). APLL=bit0,
  DPLL=bit1, EPLL=bit3 (V4A bootrom RE polls `& 0xB == 0xB`); bit 2
  inferred as VPLL since the field board reads 0x0F (all four locked).
- HPM characterization at HPM_CHECK_REG = 0x12020098 (SC_CTRL region,
  not CRG): sys_hpm_core at bits [24:16] is 9-bit on V4A (vs 10-bit on
  V4), and the aux fingerprint moves to HPM_CORE_REG0 = 0x120300D8 in
  the MISC region. Same 150-350 validity window, same 190-310 nominal
  binning range as V4.

Not yet decoded on V4A (kept out per "decypher or remove" rule from
PR #163):
- DDR cksel mux. PERI_CRG31 at 0x7C reads non-zero on hi3516av300 but
  the bit positions for the cksel field aren't in the bootrom RE and
  the V4A SDK clock driver doesn't expose DDR as a tunable Linux
  clock. Skip until a board with known DDR rate anchors it. Brief
  survey on V4A therefore omits `ddr.data_rate_mbps` -- it falls back
  to `cpu_pll.freq_mhz + hpm.bin`.

Also tightens the brief-mode emit: only the first PLL (by convention
the CPU/APLL) is shown in the survey output. DPLL/EPLL/VPLL on V4A
are detail and only appear in the full `ipctool clocks` output. The
V4 brief output is unchanged (V4 only has one PLL entry anyway).

Verified on openipc-hi3516av300.dlab.doty.ru (dual-core Cortex-A7,
SMP, Linux 4.9.37):
- brief:   cpu_pll.freq_mhz=900, hpm.bin=mid (5 lines after `clocks:`)
- full:    APLL=900 / DPLL=594 / EPLL=528 / VPLL=552 MHz, all locked;
           HPM value=260 (mid bin), aux_value=0x81090109
- cpubench median 899 MHz (within 0.1% of register-decoded APLL,
  triangulated from dep_add/indep_add/dep_mul)

V4 boards re-verified: hi3516ev300 OpenIPC + gk7205v300 OpenIPC both
still produce the 8-line brief block they did in #163.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii force-pushed the clocks-v4a-family branch from bc1cbff to eb5f0ff Compare May 14, 2026 13:26
@widgetii widgetii merged commit 593f966 into master May 14, 2026
3 checks passed
@widgetii widgetii deleted the clocks-v4a-family branch May 14, 2026 13:27
widgetii added a commit that referenced this pull request May 14, 2026
…165)

Closes #160.

`ipctool membw` runs three synthetic memory-bandwidth ops against
large anonymous DDR buffers (mmap of /dev/zero, NOT malloc) and
reports MB/s:

  write : memset over the buffer       (W-only, libc-dependent)
  read  : volatile uint32_t sum loop   (R-only, libc-INdependent
                                        -- most trustworthy for
                                        cross-firmware comparison)
  copy  : memcpy between two buffers   (R+W, counted as 2x bytes)

CLI matches the existing clocks/cpubench shape:
  --size MB      buffer size per pass (default: 16; must exceed L2)
  --iters N      passes per op        (default: 16)
  --ops a,b,c    comma list of write,read,copy (default: all)
  --json         JSON output instead of YAML

Output is YAML by default with a `chip:` tag for context:

  membw:
    buffer_mb: 16
    iters: 16
    results:
      write:
        mb_per_sec: 2243
        duration_s: 0.120
      read:
        mb_per_sec: 421
        duration_s: 0.637
      copy:
        mb_per_sec: 1863
        duration_s: 0.288
    chip: hi3516ev300

Use case (from #161 / #162 debugging): when two boards with the
same SoC behave differently, this separates "CPU pipeline is the
bottleneck" from "DDR pipeline is the bottleneck" in a few seconds.
With APLL decode and HPM bin now in `ipctool clocks` from #162-#164,
this PR closes the third leg of the same investigation flow.

Verified on four lab boards (all with majestic / vendor App stopped
to measure DDR config baseline rather than workload):

  hi3516ev300 (V4, OpenIPC):   write 2243   read 421   copy 1863  MB/s
  gk7205v300  (V4, OpenIPC):   write 2096   read 417   copy 1633  MB/s
  gk7205v300  (V4, XM Sofia):  write 1576   read 370   copy 1302  MB/s  [--size 4]
  hi3516av300 (V4A, OpenIPC):  write 2320   read 427   copy 2440  MB/s

XM Sofia ran with --size 4 because the board has only 48 MB
userspace memory (the rest is mmz_anonymous for the encoder), so
the default 32 MB total (2 x 16 MB buffers) doesn't fit -- confirms
--size is a genuine knob, not just a tunable.

Buffer-via-mmap caveat baked in per the issue: anonymous DDR pages
rather than tmpfs / page cache.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant