Skip to content

Run RISC-V tests with multiple RVV QEMU configurations#19707

Open
luhenry wants to merge 6 commits into
pytorch:mainfrom
riseproject-dev:riscv-testing-rvv
Open

Run RISC-V tests with multiple RVV QEMU configurations#19707
luhenry wants to merge 6 commits into
pytorch:mainfrom
riseproject-dev:riscv-testing-rvv

Conversation

@luhenry
Copy link
Copy Markdown
Contributor

@luhenry luhenry commented May 20, 2026

Summary

This tackles Phase 3 of #18991, in continuation of #19399, #19521, #19617.

Test plan

Exclusively adding CI testing, no new feature. CI will cover the testing.

luhenry added 3 commits May 20, 2026 22:09
Given RISC-V allows different hardware implementations to have different
vector length (similar to ARM SVE), we want to make sure that we test
on different configurations. Luckily, QEMU allows us to simply set a
vlen=<128,256,512,...> parameter on QEMU_CPU to emulate different
vector length.
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 20, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19707

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure, 9 Unclassified Failures

As of commit e17eca9 with merge base 6ba868e (image):

NEW FAILURES - The following jobs have failed:

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 20, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented May 20, 2026

cc @rascani @mergennachin, and @GregoryComer @digantdesai for XNNPACK

)
if(DEFINED EXECUTORCH_XNNPACK_LOG_LEVEL)
target_compile_definitions(
xnnpack-logging PUBLIC XNN_LOG_LEVEL=${EXECUTORCH_XNNPACK_LOG_LEVEL}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@digantdesai
Copy link
Copy Markdown
Contributor

@claude review this PR

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Claude finished @digantdesai's task in 1m 26s —— View job


PR Review: Run RISC-V tests with multiple RVV QEMU configurations

  • Read changed files and understand the PR
  • Review CI workflow changes
  • Review test scripts
  • Review CMake changes
  • Review etdump_summary.py
  • Post review feedback

Overall this is a solid PR that adds multi-VLEN QEMU coverage for RVV and integrates XNNPACK microkernel profiling into the CI workflow. A few observations:

Workflow Logic (_test_riscv.yml)

Potential issue: while read in a pipeline runs in a subshell. At line 59, the loop echo '...' | jq ... | while IFS= read -r qemu_cpu; do ... done runs in a subshell. If any iteration fails, the exit code from inside the loop won't propagate to the outer script due to the pipe. Since set -e (from the calling script) doesn't reach into subshells behind pipes, a QEMU failure mid-loop would be silently swallowed for intermediate iterations — only the last iteration's status would matter.

Consider using process substitution or a for loop with mapfile:

mapfile -t cpus < <(echo '${{ inputs.qemu-cpu }}' | jq -r '.[]')
for qemu_cpu in "${cpus[@]}"; do
  ...
done

Minor: $GITHUB_STEP_SUMMARY block assumes the etdump JSON exists. If --verbose-xnnpack isn't passed (non-XNNPACK path), the etdump JSON won't be generated, but the summary generation block still runs. The jq commands will fail. However, since --verbose-xnnpack is always passed when xnnpack=true (line 64), and the non-XNNPACK path doesn't iterate with RVV configs where this matters — this is likely fine in practice for the current matrix. But if someone later adds an XNNPACK test without --verbose-xnnpack, the summary block would error.

examples/riscv/run.sh

Good change: Replacing the mktemp + trap cleanup with a deterministic path (${output_dir}/${model}_riscv.run.log) is the right call — it makes the log available for downstream processing by etdump_summary.py.

The || true on etdump_summary.py (line 152) is reasonable for CI robustness, but means summary generation failures will be silent. Worth noting.

tools/cmake/preset/riscv64_linux.cmake

The EXECUTORCH_BUILD_RISCV_ETDUMP option is well-structured — gating event tracing behind a flag keeps default builds lean. The FLATCC_ALLOW_WERROR OFF is a pragmatic workaround for cross-compilation warnings.

One subtlety: when EXECUTORCH_BUILD_RISCV_ETDUMP is OFF, you explicitly set(EXECUTORCH_ENABLE_EVENT_TRACER OFF). If a user previously set EXECUTORCH_ENABLE_EVENT_TRACER ON via command line, this would override it. Consider using set_overridable_option for symmetry, or documenting that this preset enforces no event tracing unless the etdump flag is set.

backends/xnnpack/CMakeLists.txt

Clean addition. Applying XNN_LOG_LEVEL to xnnpack-logging as PUBLIC makes sense since consumers of this target need the define propagated.

examples/riscv/etdump_summary.py

Well-structured profiling tool. The regex-based kernel matching (_KIND_PATTERN) is thorough. A couple of notes:

  • The _DTYPE_TOKENS matching (line 122: all(d in sym for d in dtype_tokens)) does substring matching — e.g., "f32" would match "xnn_f32_..." but also "xnn_bf32_...". This is probably fine given the symbol naming conventions in XNNPACK, but worth being aware of.
  • ev.perf_data.raw values are used directly — are these guaranteed to be in milliseconds? The Inspector API docs should confirm the unit. If they're in a different unit (e.g., nanoseconds), the summary output would be misleading.

riscv64.ymlcase() Expression

The case() GitHub Actions expression (lines 56-65) is a relatively new feature. This looks correct, but worth verifying it's supported on the pytorch/test-infra runners (it requires Actions runtime ≥ 2.314).

CI Failures

The 5 "UNCLASSIFIED FAILURES" all show docker exec ... /exec failed with exit code 1 for the non-XNNPACK cases (xnnpack: false). These are the jobs using qemu-cpu: '["rv64,zba=true,zbb=true,zbs=true,v=false"]'. Worth investigating whether the v=false QEMU CPU flag is causing issues, or if this is a pre-existing infrastructure problem.


Summary: The PR is well-organized and the approach is sound. The main actionable concern is the subshell/pipe behavior in the workflow loop that could mask mid-loop failures. The rest are minor observations.

Comment thread examples/riscv/run.sh
Comment on lines +130 to +132
etdump_path="${output_dir}/${model}_riscv.etdump"
rm -f "${etdump_path}"
runner_extra_args+=(--etdump_path="${etdump_path}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps rename the verbose_xnnpack flag to debug or something else to indicate impact beyond xnnpack build

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to debug_xnnpack and passing --debug-xnnpack. It's still XNNPACK related but more debug than verbose.

# segments (e.g. `_gemm_ukernel_`, `_gemm_minmax_ukernel_`,
# `_gemm_minmax_fp32_ukernel_`, ...).
_INFIX = r"(?:[a-z0-9]+_)*"
_KIND_PATTERN = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better way to do this? more maintainable way I guess? pase the type_to_string enum or something?

Copy link
Copy Markdown
Contributor Author

@luhenry luhenry May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to suggestions, I'm not the most familiar with XNNPACK and how they name kernels.

pase the type_to_string enum or something?

I'll have a look at what's doable with that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing the source file from this py file seems a good starting point. I don't want this to break when we update xnnpack submodule.

Comment thread .github/workflows/riscv64.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants