Add membw subcommand: DDR bandwidth probe (memset / read / memcpy)#165
Merged
Conversation
Closes #160. `ipctool membw` runs three synthetic memory-bandwidth ops against large anonymous DDR buffers (mmap of /dev/zero, NOT malloc) and reports MB/s: write : memset over the buffer (W-only, libc-dependent) read : volatile uint32_t sum loop (R-only, libc-INdependent -- most trustworthy for cross-firmware comparison) copy : memcpy between two buffers (R+W, counted as 2x bytes) CLI matches the existing clocks/cpubench shape: --size MB buffer size per pass (default: 16; must exceed L2) --iters N passes per op (default: 16) --ops a,b,c comma list of write,read,copy (default: all) --json JSON output instead of YAML Output is YAML by default with a `chip:` tag for context: membw: buffer_mb: 16 iters: 16 results: write: mb_per_sec: 2243 duration_s: 0.120 read: mb_per_sec: 421 duration_s: 0.637 copy: mb_per_sec: 1863 duration_s: 0.288 chip: hi3516ev300 Use case (from #161 / #162 debugging): when two boards with the same SoC behave differently, this separates "CPU pipeline is the bottleneck" from "DDR pipeline is the bottleneck" in a few seconds. With APLL decode and HPM bin now in `ipctool clocks` from #162-#164, this PR closes the third leg of the same investigation flow. Verified on four lab boards (all with majestic / vendor App stopped to measure DDR config baseline rather than workload): hi3516ev300 (V4, OpenIPC): write 2243 read 421 copy 1863 MB/s gk7205v300 (V4, OpenIPC): write 2096 read 417 copy 1633 MB/s gk7205v300 (V4, XM Sofia): write 1576 read 370 copy 1302 MB/s [--size 4] hi3516av300 (V4A, OpenIPC): write 2320 read 427 copy 2440 MB/s XM Sofia ran with --size 4 because the board has only 48 MB userspace memory (the rest is mmz_anonymous for the encoder), so the default 32 MB total (2 x 16 MB buffers) doesn't fit -- confirms --size is a genuine knob, not just a tunable. Buffer-via-mmap caveat baked in per the issue: anonymous DDR pages rather than tmpfs / page cache. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #160.
Summary
ipctool membwruns three synthetic memory-bandwidth ops against large anonymous DDR buffers (mmap of/dev/zero, NOT malloc) and reports MB/s:writereadcopyThe
readop is the most trustworthy number when comparing across firmwares (musl vs uClibc vs glibc);write/copyare bounded by libc memset/memcpy vectorization.CLI
Matches the existing
clocks/cpubenchshape:Sample output
Why this fits
The
clocks(#162-#164) +cpubench(#162) + thismembwPR together close the diagnostic loop motivated by #161: when two boards on the same SoC behave differently, you can now separate CPU-pipeline (cpubench), DDR-pipeline (this PR), and PLL-config (clocks) in three quick subcommands.Caveats baked into design (per issue body)
mmap(/dev/zero, MAP_PRIVATE), so anonymous DDR pages rather than tmpfs / page cache.Cross-board verification
All four lab boards, majestic / App paused for the DDR baseline:
(MB/s;
readis the libc-independent number — bold)XM Sofia ran with
--size 4because the board has only 48 MB of userspace memory (the rest ismmz_anonymousfor the encoder), so the default 32 MB total (2 × 16 MB buffers) doesn't fit. Confirms--sizeis a genuinely useful knob, not just a tunable. V4A copy is notably higher because it's dual-core SMP.Test plan
-Wextraipctool membw --helpprints concise usageipctool membw --jsonproduces valid JSON viajqround-tripipctool membw --ops read,copyskips thewriteopipctool membw --size 1errors cleanly (must exceed L2 documented but not enforced — caller's choice)--sizereturns clean error (membw: mmap N MB: Cannot allocate memory) rather than crash🤖 Generated with Claude Code