[XPU] Fix illegal instruction error when running Intel P800-compiled RDMA libs on Hygon P800#7935
Conversation
|
Thanks for your contribution! |
…RDMA libs on Hygon P800
fe8b54c to
1b1b198
Compare
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-27 10:25:05
📋 Review 摘要
PR 概述:修复 Intel P800 编译的 RDMA 库在海光 P800 上运行时因 -march=native 生成非法指令导致的 import 失败问题。
变更范围:cache_manager/transfer_factory/kvcache_transfer/CMakeLists.txt、setup.py
影响面 Tag:[XPU] [KVCache] [PD Disaggregation]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| ❓ 疑问 | CMakeLists.txt:18 |
条件仅覆盖 DEVICE_TYPE=xpu,其他设备类型是否也有跨平台部署需求? |
| ❓ 疑问 | CMakeLists.txt:19 |
-march=x86-64-v3 需要 GCC 11+,请确认构建环境最低版本 |
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | PR 描述 Usage or Command、Accuracy Tests 为空,Checklist 全部未勾选 |
📝 PR 规范检查
标题格式合规(含 [XPU] 官方 Tag)。描述中 Motivation 和 Modifications 已填写,但 Usage or Command、Accuracy Tests 段落为空,Checklist 全部未勾选。
标题建议(可直接复制):
[XPU] Fix illegal instruction error when running Intel P800-compiled RDMA libs on Hygon P800
PR 描述建议(点击展开,可直接复制)
## Motivation
Cross-platform binary incompatibility — a shared library compiled with -march=native on Intel generates instructions that are illegal on Hygon (AMD-based) CPUs. The RDMA library compiled on an Intel P800 machine, when run on a Hygon (海光) P800 machine, causes an "illegal instruction" error upon `import rdma_comm`, preventing the cache messenger from starting.
## Modifications
- `fastdeploy/cache_manager/transfer_factory/kvcache_transfer/CMakeLists.txt`:当 `DEVICE_TYPE=xpu` 且架构为 `x86_64/AMD64` 时,将 `-march=native` 替换为 `-march=x86-64-v3`(AVX2 基线),其余场景保持 `-march=native` 不变。
- `setup.py`:在 cmake 参数中新增 `-DDEVICE_TYPE={get_device_type()}`,将设备类型传递给 CMake 以启用条件编译逻辑。
## Usage or Command
N/A
## Accuracy Tests
N/A(编译选项变更,不影响模型输出)
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests. (编译选项变更,无需单测)
- [x] Provide accuracy results. (N/A,不影响模型输出)
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
修复思路正确,通过条件编译将 XPU+x86_64 场景的 -march=native 替换为可移植的 -march=x86-64-v3,解决了跨 CPU 厂商的二进制兼容性问题。有两处疑问需作者确认后可合入。
| set(CMAKE_CXX_COMPILER g++) | ||
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -Ofast -ffast-math -funroll-loops -march=native -std=c++11") | ||
| if(DEVICE_TYPE STREQUAL "xpu" AND CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64|AMD64") | ||
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -Ofast -ffast-math -funroll-loops -march=x86-64-v3 -std=c++11") |
There was a problem hiding this comment.
❓ 疑问 条件仅覆盖 DEVICE_TYPE=xpu,其他设备类型(如 dcu)若在 Intel 机器上编译、在海光机器上运行,同样会触发 illegal instruction 错误。
是否有意只修复 XPU 场景?如果是,建议在注释中说明原因;如果其他设备类型也有跨平台部署需求,可考虑将条件扩展为:
if(CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64|AMD64")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -Ofast -ffast-math -funroll-loops -march=x86-64-v3 -std=c++11")
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -Ofast -ffast-math -funroll-loops -march=native -std=c++11")
endif()There was a problem hiding this comment.
目前仅 XPU 的硬件环境比较复杂,其它硬件如果需要可自行增加
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -Ofast -ffast-math -funroll-loops -march=native -std=c++11") | ||
| if(DEVICE_TYPE STREQUAL "xpu" AND CMAKE_SYSTEM_PROCESSOR MATCHES "x86_64|AMD64") | ||
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -Ofast -ffast-math -funroll-loops -march=x86-64-v3 -std=c++11") | ||
| else() |
There was a problem hiding this comment.
❓ 疑问 -march=x86-64-v3 需要 GCC 11+(2021 年发布)才支持该 march level 名称。请确认构建环境的最低 GCC 版本满足此要求,否则会出现 error: bad value (x86-64-v3) for -march= switch 编译错误。
There was a problem hiding this comment.
P800 基于 ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:xpu-ubuntu2204-x86_64-gcc123-py310 镜像,默认 gcc 版本 12.3。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览主测试任务
2 任务状态汇总日志列说明:失败任务直接使用工具预生成的日志链接;运行中任务使用 Job 链接。 2.1 Required任务 : 8/10 通过
2.2 可选任务 — 23/29 通过
3 失败详情(仅 required)无 required 失败任务,本轮无需调用 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7935 +/- ##
==========================================
Coverage ? 73.19%
==========================================
Files ? 404
Lines ? 56841
Branches ? 8890
==========================================
Hits ? 41604
Misses ? 12415
Partials ? 2822
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Cross-platform binary incompatibility — a shared library compiled with -march=native on Intel generates instructions that are illegal on Hygon (AMD-based) CPUs.
Modifications
The RDMA library compiled on an Intel P800 machine, when run on a Hygon (海光) P800 machine, causes an "illegal instruction" error upon import rdma_comm, preventing the cache messager from starting. The root cause is that the Hygon P800 uses a different CPU with a different instruction set than the Intel P800, so -march=native cannot be used as a compilation option.
Solution: Replace -march=native with a more portable option such as -march=x86-64-v3 (AVX2 baseline), depending on the Hygon CPU feature set required.
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.