Add `membw` subcommand: DDR bandwidth probe (memset / read / memcpy) by widgetii · Pull Request #165 · OpenIPC/ipctool

widgetii · 2026-05-14T13:42:20Z

Closes #160.

Summary

ipctool membw runs three synthetic memory-bandwidth ops against large anonymous DDR buffers (mmap of /dev/zero, NOT malloc) and reports MB/s:

Op	Bytes counted	Trustworthy across libcs?
`write`	sz	libc memset (depends on vectorization)
`read`	sz	yes — volatile uint32_t sum loop, libc-INdependent
`copy`	2 × sz	libc memcpy (depends on vectorization)

The read op is the most trustworthy number when comparing across firmwares (musl vs uClibc vs glibc); write / copy are bounded by libc memset/memcpy vectorization.

CLI

Matches the existing clocks / cpubench shape:

ipctool membw [--size MB] [--iters N] [--ops set,...] [--json]
  --size MB     buffer size per pass (default: 16; must exceed L2)
  --iters N     passes per op        (default: 16)
  --ops a,b,c   comma list of write,read,copy (default: all)
  --json        JSON output instead of YAML

Sample output

membw:
  buffer_mb: 16
  iters: 16
  results:
    write:
      mb_per_sec: 2243
      duration_s: 0.120
    read:
      mb_per_sec: 421
      duration_s: 0.637
    copy:
      mb_per_sec: 1863
      duration_s: 0.288
  chip: hi3516ev300

Why this fits

The clocks (#162-#164) + cpubench (#162) + this membw PR together close the diagnostic loop motivated by #161: when two boards on the same SoC behave differently, you can now separate CPU-pipeline (cpubench), DDR-pipeline (this PR), and PLL-config (clocks) in three quick subcommands.

Caveats baked into design (per issue body)

Buffers come from mmap(/dev/zero, MAP_PRIVATE), so anonymous DDR pages rather than tmpfs / page cache.
Default 16 MB per buffer comfortably exceeds the V4-family L2 (256 KB - 1 MB). Smaller sizes measure L2/L1, not DDR.
Streamer / encoder DMA traffic loads DDR. To measure the DDR config baseline, stop majestic / vendor App first; to measure real workload bandwidth, leave them running.
Default 16 MB × 16 iters × 3 ops processes ~1 GB total and takes <2 s on a healthy V4 board.

Cross-board verification

All four lab boards, majestic / App paused for the DDR baseline:

Board	Buffer	write	read	copy
hi3516ev300 (V4, OpenIPC)	16 MB	2243	421	1863
gk7205v300 (V4, OpenIPC)	16 MB	2096	417	1633
gk7205v300 (V4, XM Sofia)	4 MB	1576	370	1302
hi3516av300 (V4A, OpenIPC)	16 MB	2320	427	2440

(MB/s; read is the libc-independent number — bold)

XM Sofia ran with --size 4 because the board has only 48 MB of userspace memory (the rest is mmz_anonymous for the encoder), so the default 32 MB total (2 × 16 MB buffers) doesn't fit. Confirms --size is a genuinely useful knob, not just a tunable. V4A copy is notably higher because it's dual-core SMP.

Test plan

cv100 toolchain build + UPX, no warnings under -Wextra
ipctool membw --help prints concise usage
ipctool membw --json produces valid JSON via jq round-trip
ipctool membw --ops read,copy skips the write op
ipctool membw --size 1 errors cleanly (must exceed L2 documented but not enforced — caller's choice)
OOM on too-large --size returns clean error (membw: mmap N MB: Cannot allocate memory) rather than crash
Verified on 4 boards above; numbers stable across runs to within ~1%

🤖 Generated with Claude Code

Closes #160. `ipctool membw` runs three synthetic memory-bandwidth ops against large anonymous DDR buffers (mmap of /dev/zero, NOT malloc) and reports MB/s: write : memset over the buffer (W-only, libc-dependent) read : volatile uint32_t sum loop (R-only, libc-INdependent -- most trustworthy for cross-firmware comparison) copy : memcpy between two buffers (R+W, counted as 2x bytes) CLI matches the existing clocks/cpubench shape: --size MB buffer size per pass (default: 16; must exceed L2) --iters N passes per op (default: 16) --ops a,b,c comma list of write,read,copy (default: all) --json JSON output instead of YAML Output is YAML by default with a `chip:` tag for context: membw: buffer_mb: 16 iters: 16 results: write: mb_per_sec: 2243 duration_s: 0.120 read: mb_per_sec: 421 duration_s: 0.637 copy: mb_per_sec: 1863 duration_s: 0.288 chip: hi3516ev300 Use case (from #161 / #162 debugging): when two boards with the same SoC behave differently, this separates "CPU pipeline is the bottleneck" from "DDR pipeline is the bottleneck" in a few seconds. With APLL decode and HPM bin now in `ipctool clocks` from #162-#164, this PR closes the third leg of the same investigation flow. Verified on four lab boards (all with majestic / vendor App stopped to measure DDR config baseline rather than workload): hi3516ev300 (V4, OpenIPC): write 2243 read 421 copy 1863 MB/s gk7205v300 (V4, OpenIPC): write 2096 read 417 copy 1633 MB/s gk7205v300 (V4, XM Sofia): write 1576 read 370 copy 1302 MB/s [--size 4] hi3516av300 (V4A, OpenIPC): write 2320 read 427 copy 2440 MB/s XM Sofia ran with --size 4 because the board has only 48 MB userspace memory (the rest is mmz_anonymous for the encoder), so the default 32 MB total (2 x 16 MB buffers) doesn't fit -- confirms --size is a genuine knob, not just a tunable. Buffer-via-mmap caveat baked in per the issue: anonymous DDR pages rather than tmpfs / page cache. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

widgetii merged commit 086cf8e into master May 14, 2026
3 checks passed

widgetii deleted the membw-issue-160 branch May 14, 2026 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `membw` subcommand: DDR bandwidth probe (memset / read / memcpy)#165

Add `membw` subcommand: DDR bandwidth probe (memset / read / memcpy)#165
widgetii merged 1 commit into
masterfrom
membw-issue-160

widgetii commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

widgetii commented May 14, 2026

Summary

CLI

Sample output

Why this fits

Caveats baked into design (per issue body)

Cross-board verification

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant