[Feature] Support multimodal RoPE 3D -> 1D in append_attention#7932
[Feature] Support multimodal RoPE 3D -> 1D in append_attention#7932xiaoxiaohehe001 wants to merge 4 commits into
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览存在 1 个 Required 任务失败,阻塞合并,需优先处理。
2 任务状态汇总2.1 Required任务 : 6/7 通过
2.2 可选任务 — 13/13 通过
3 失败详情(仅 required)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率不足(置信度: 高)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
未覆盖文件详情:
根因详情: 关键日志: 修复建议:
修复建议摘要: 为 rotary_embedding.py L653-672、mtp.py L544-552 新增单测 关联变更: 链接: 查看日志 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/online/20260415 #7932 +/- ##
==========================================================
Coverage ? 72.31%
==========================================================
Files ? 388
Lines ? 54137
Branches ? 8490
==========================================================
Hits ? 39148
Misses ? 12284
Partials ? 2705
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5e697f2 to
12b25e8
Compare
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-29 20:46:57
📋 Review 摘要
PR 概述:支持多模态场景下的 RoPE 3D -> 1D 转换,通过新增 rope_3d_delta 参数在 decode/encode/speculate 三条路径中实现基于批次的位置偏移。
变更范围:custom_ops/gpu_ops/append_attn/、fastdeploy/model_executor/layers/attention/、fastdeploy/worker/、fastdeploy/spec_decode/
影响面 Tag:[OP] [Speculative Decoding] [Engine]
问题
未发现阻塞性问题。历史 Findings 仍有部分未修复,详见下方。
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | Encoder 路径缺少 rope_3d_delta + use_neox_style 组合校验 |
|
| F2 | # Fix 注释过于模糊 |
|
| F3 | encoder/speculate 路径中 rope_3d_delta 的 guard 逻辑与 decoder 不一致 |
🔄 部分修复 |
F3 说明:speculate 路径已新增 rope_3d_delta 的 NeoX 和 cache_quant_type 校验(speculate_write_cache_with_rope_kernel.cu:574-585),但 encoder 路径(encoder_write_cache_with_rope_kernel.h)仍未添加对应 guard。此外 speculate 路径的 if/else 结构(if rope_3d_delta ... else if rope_3d ... else ...)与 decoder 路径的嵌套结构(if rope_3d { if rope_3d_delta ... else ... })存在逻辑语义差异——speculate 会在 rope_3d=false 时仍应用 delta(实际运行中不会触发,因为 Python 侧保证了 rope_3d_delta 非空时 rope_3d 必为 True)。
📝 PR 规范检查
目标分支为 release/online/20260415(非 develop),标题缺少 [Cherry-Pick] 前缀;PR 描述使用 ### 而非 ## 作为 section 标题,且缺少 ## Accuracy Tests 段落,Checklist 条目与标准模板不符。
标题建议(可直接复制):
[Cherry-Pick][Feature] Support multimodal RoPE 3D -> 1D in append_attention(#<原PR号>)
PR 描述建议(点击展开,可直接复制)
## Motivation
支持多模态场景下的 RoPE 3D -> 1D 转换,使 append_attention 系列 kernel 能够正确处理多模态输入的旋转位置编码。通过新增 `rope_3d_delta` 参数,在 decode/encode/speculate 三条路径中实现基于批次的位置偏移,替代原有的 `ori_bi * max_seq_len * head_size` 固定偏移方式。
## Modifications
- `custom_ops/gpu_ops/append_attn/decoder_write_cache_with_rope_impl.cuh` / `_kernel.cu` / `_kernel.h`:新增 `rope_3d_delta` 参数,kernel 内部根据 `rope_3d_delta` 计算 `rope_pos = write_seq_id + rope_3d_delta[bid]`
- `custom_ops/gpu_ops/append_attn/encoder_write_cache_with_rope_impl.cuh` / `_kernel.h`:同步新增 `rope_3d_delta` 参数传递
- `custom_ops/gpu_ops/append_attn/speculate_write_cache_with_rope_impl.cuh` / `_kernel.cu` / `_kernel.h`:同步新增 `rope_3d_delta` 参数传递
- `custom_ops/gpu_ops/append_attention.cu`、`cpp_extensions.cc`:在 `AppendAttention` / `AppendAttentionWithOutput` 接口中透传 `rope_3d_delta`
- `fastdeploy/model_executor/forward_meta.py`:`ForwardMeta` 新增 `rope_3d_delta` 字段
- `fastdeploy/model_executor/layers/rotary_embedding.py`:新增 3D -> 1D 转换逻辑
- `fastdeploy/model_executor/layers/attention/{append_attn_backend,flash_mask_attn_backend,ops/append_attention}.py`:在 attention 调用链中传入 `rope_3d_delta`
- `fastdeploy/worker/gpu_model_runner.py`、`fastdeploy/worker/input_batch.py`:调度阶段构建并维护 `rope_3d_delta`
- `fastdeploy/spec_decode/mtp.py`、`fastdeploy/engine/common_engine.py`:投机解码 / engine 路径同步适配
- `tests/layers/test_append_attention*.py`、`tests/operators/test_tree_mask.py`、`tests/deterministic/test_c16_warp1_4_determinism.py`:补齐 `rope_3d_delta` 入参
## Usage or Command
N/A(多模态模型内部根据 `rope_3d` 配置自动启用,无需用户侧改动)
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
整体实现质量良好,rope_3d_delta 从 Python 调度层(input_batch.py 分配、gpu_model_runner.py 计算)到 CUDA kernel 的端到端传递链路完整,decoder 路径的 guard 校验较为充分。建议后续迭代补齐 encoder 路径的 guard 校验以保持三条路径一致性。
Motivation
支持多模态场景下的 RoPE 3D -> 1D 转换,使 append_attention 系列 kernel 能够正确处理多模态输入的旋转位置编码。
Modifications
Kernel 侧 (custom_ops/gpu_ops/append_attn)
decoder_write_cache_with_rope/encoder_write_cache_with_rope/speculate_write_cache_with_rope:新增对 RoPE 3D -> 1D 的处理逻辑,kernel 入参增加rope_3d_position_ids/rope_3d_delta相关参数。append_attention.cu、cpp_extensions.cc:透传新参数到 Python 端。Python 侧 (fastdeploy)
model_executor/forward_meta.py:ForwardMeta中新增rope_3d_delta字段。model_executor/layers/rotary_embedding.py:新增 3D -> 1D 转换逻辑。model_executor/layers/attention/{append_attn_backend, flash_mask_attn_backend, ops/append_attention}.py:在 attention 调用链中传入rope_3d_delta。worker/gpu_model_runner.py、worker/input_batch.py:在调度阶段构建并维护rope_3d_delta。spec_decode/mtp.py、engine/common_engine.py:投机解码 / engine 路径同步适配。测试
tests/layers/test_append_attention*.py、tests/operators/test_tree_mask.py、tests/deterministic/test_c16_warp1_4_determinism.py:新增/补齐rope_3d_delta相关入参,保证已有测试通过。Usage
无需用户侧改动。多模态模型在内部根据
rope_3d配置自动启用 3D -> 1D 转换路径。Checklist