[Feature] Support multimodal RoPE 3D -> 1D in append_attention by xiaoxiaohehe001 · Pull Request #7932 · PaddlePaddle/FastDeploy

xiaoxiaohehe001 · 2026-05-26T13:20:01Z

Motivation

支持多模态场景下的 RoPE 3D -> 1D 转换，使 append_attention 系列 kernel 能够正确处理多模态输入的旋转位置编码。

Modifications

Kernel 侧 (custom_ops/gpu_ops/append_attn)

decoder_write_cache_with_rope / encoder_write_cache_with_rope / speculate_write_cache_with_rope：新增对 RoPE 3D -> 1D 的处理逻辑，kernel 入参增加 rope_3d_position_ids / rope_3d_delta 相关参数。
append_attention.cu、cpp_extensions.cc：透传新参数到 Python 端。
模板实例化文件同步更新。

Python 侧 (fastdeploy)

model_executor/forward_meta.py：ForwardMeta 中新增 rope_3d_delta 字段。
model_executor/layers/rotary_embedding.py：新增 3D -> 1D 转换逻辑。
model_executor/layers/attention/{append_attn_backend, flash_mask_attn_backend, ops/append_attention}.py：在 attention 调用链中传入 rope_3d_delta。
worker/gpu_model_runner.py、worker/input_batch.py：在调度阶段构建并维护 rope_3d_delta。
spec_decode/mtp.py、engine/common_engine.py：投机解码 / engine 路径同步适配。

测试

tests/layers/test_append_attention*.py、tests/operators/test_tree_mask.py、tests/deterministic/test_c16_warp1_4_determinism.py：新增/补齐 rope_3d_delta 相关入参，保证已有测试通过。

Usage

无需用户侧改动。多模态模型在内部根据 rope_3d 配置自动启用 3D -> 1D 转换路径。

Checklist

已通过本地编译
已通过相关单测
文档/注释已更新（如适用）

paddle-bot · 2026-05-26T13:20:09Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-26T13:49:38Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-29 23:44:13

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 5c642c6
Merge base: 0d7fccd (branch: release/online/20260415)
查看完整 Diff
CI 详情

1 任务总览

存在 1 个 Required 任务失败，阻塞合并，需优先处理。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
20(0)	20	19	1	0	0	0

2 任务状态汇总

2.1 Required任务 : 6/7 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`run_tests_with_coverage`	1h16m	PR问题：差异行覆盖率 41%，未达 80% 阈值	为 rotary_embedding.py、mtp.py 新增单测	Job	-
✅	其余 6 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 13/13 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
✅	全部 13 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率不足（置信度: 高）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 覆盖率不足
置信度: 高
根因摘要: 差异行覆盖率 41%，未达 80% 阈值，多文件新增代码缺少单测
分析器: ci_analyze_unittest_fastdeploy

未覆盖文件详情:

文件	覆盖率	未覆盖行
`fastdeploy/model_executor/layers/rotary_embedding.py`	8.3%	L653, L656, L660, L665-L672
`fastdeploy/spec_decode/mtp.py`	0%	L544-L546, L548, L552
`fastdeploy/worker/gpu_model_runner.py`	64.7%	L218, L808-L809, L811-L812, L3192
`fastdeploy/engine/common_engine.py`	50%	L983
`fastdeploy/worker/input_batch.py`	50%	L372, L502-L503, L505, L707-L708, L710

根因详情:
本次 PR 新增了多模态 RoPE 3D→1D 支持，rotary_embedding.py L653-L672 新增代码块覆盖率仅 8.3%，mtp.py L544-L552 新增代码覆盖率为 0%。所有单元测试均通过（TEST_EXIT_CODE=0），但差异行整体覆盖率仅 41%（共 51 个变更行，30 行未被覆盖），远低于 80% 的合并阈值（COVERAGE_EXIT_CODE=9）。

关键日志:

COVERAGE_EXIT_CODE: 9
total_percent_covered: 41%  (need ≥ 80%)
rotary_embedding.py : 8.3%  covered (violations: L653,656,660,665-672)
mtp.py              : 0%    covered (violations: L544,545,546,548,552)
gpu_model_runner.py : 64.7% covered (violations: L218,808,809,811,812,3192)
input_batch.py      : 50%   covered (violations: L372,502,503,505,707,708,710)
common_engine.py    : 50%   covered (violations: L983)

修复建议:

在 tests/ 中为 fastdeploy/model_executor/layers/rotary_embedding.py L653-L672 的新逻辑（multimodal 3D→1D RoPE）添加单元测试
为 fastdeploy/spec_decode/mtp.py L544-L552 的新增代码添加测试覆盖
若上述代码路径属于 GPU 专用或难以在 CI 环境中执行的逻辑，可向团队申请覆盖率豁免

修复建议摘要: 为 rotary_embedding.py L653-672、mtp.py L544-552 新增单测

关联变更: fastdeploy/model_executor/layers/rotary_embedding.py, fastdeploy/spec_decode/mtp.py, fastdeploy/worker/gpu_model_runner.py, fastdeploy/worker/input_batch.py, fastdeploy/engine/common_engine.py

链接: 查看日志

codecov-commenter · 2026-05-29T09:07:11Z

Codecov Report

❌ Patch coverage is 33.33333% with 34 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/online/20260415@0d7fccd). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...stdeploy/model_executor/layers/rotary_embedding.py	8.33%	11 Missing ⚠️
fastdeploy/worker/gpu_model_runner.py	52.94%	6 Missing and 2 partials ⚠️
fastdeploy/worker/input_batch.py	42.85%	7 Missing and 1 partial ⚠️
fastdeploy/spec_decode/mtp.py	0.00%	5 Missing ⚠️
fastdeploy/engine/common_engine.py	0.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@                    Coverage Diff                     @@
##             release/online/20260415    #7932   +/-   ##
==========================================================
  Coverage                           ?   72.31%           
==========================================================
  Files                              ?      388           
  Lines                              ?    54137           
  Branches                           ?     8490           
==========================================================
  Hits                               ?    39148           
  Misses                             ?    12284           
  Partials                           ?     2705

Flag	Coverage Δ
GPU	`72.31% <33.33%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-29 20:46:57

📋 Review 摘要

PR 概述：支持多模态场景下的 RoPE 3D -> 1D 转换，通过新增 rope_3d_delta 参数在 decode/encode/speculate 三条路径中实现基于批次的位置偏移。
变更范围：custom_ops/gpu_ops/append_attn/、fastdeploy/model_executor/layers/attention/、fastdeploy/worker/、fastdeploy/spec_decode/
影响面 Tag：[OP] [Speculative Decoding] [Engine]

问题

未发现阻塞性问题。历史 Findings 仍有部分未修复，详见下方。

历史 Findings 修复情况

Finding	问题	状态
F1	Encoder 路径缺少 `rope_3d_delta + use_neox_style` 组合校验	⚠️ 仍存在
F2	`# Fix` 注释过于模糊	⚠️ 仍存在
F3	encoder/speculate 路径中 `rope_3d_delta` 的 guard 逻辑与 decoder 不一致	🔄 部分修复

F3 说明：speculate 路径已新增 rope_3d_delta 的 NeoX 和 cache_quant_type 校验（speculate_write_cache_with_rope_kernel.cu:574-585），但 encoder 路径（encoder_write_cache_with_rope_kernel.h）仍未添加对应 guard。此外 speculate 路径的 if/else 结构（if rope_3d_delta ... else if rope_3d ... else ...）与 decoder 路径的嵌套结构（if rope_3d { if rope_3d_delta ... else ... }）存在逻辑语义差异——speculate 会在 rope_3d=false 时仍应用 delta（实际运行中不会触发，因为 Python 侧保证了 rope_3d_delta 非空时 rope_3d 必为 True）。

📝 PR 规范检查

目标分支为 release/online/20260415（非 develop），标题缺少 [Cherry-Pick] 前缀；PR 描述使用 ### 而非 ## 作为 section 标题，且缺少 ## Accuracy Tests 段落，Checklist 条目与标准模板不符。

标题建议（可直接复制）：

[Cherry-Pick][Feature] Support multimodal RoPE 3D -> 1D in append_attention(#<原PR号>)

PR 描述建议（点击展开，可直接复制）

## Motivation
支持多模态场景下的 RoPE 3D -> 1D 转换，使 append_attention 系列 kernel 能够正确处理多模态输入的旋转位置编码。通过新增 `rope_3d_delta` 参数，在 decode/encode/speculate 三条路径中实现基于批次的位置偏移，替代原有的 `ori_bi * max_seq_len * head_size` 固定偏移方式。

## Modifications
- `custom_ops/gpu_ops/append_attn/decoder_write_cache_with_rope_impl.cuh` / `_kernel.cu` / `_kernel.h`：新增 `rope_3d_delta` 参数，kernel 内部根据 `rope_3d_delta` 计算 `rope_pos = write_seq_id + rope_3d_delta[bid]`
- `custom_ops/gpu_ops/append_attn/encoder_write_cache_with_rope_impl.cuh` / `_kernel.h`：同步新增 `rope_3d_delta` 参数传递
- `custom_ops/gpu_ops/append_attn/speculate_write_cache_with_rope_impl.cuh` / `_kernel.cu` / `_kernel.h`：同步新增 `rope_3d_delta` 参数传递
- `custom_ops/gpu_ops/append_attention.cu`、`cpp_extensions.cc`：在 `AppendAttention` / `AppendAttentionWithOutput` 接口中透传 `rope_3d_delta`
- `fastdeploy/model_executor/forward_meta.py`：`ForwardMeta` 新增 `rope_3d_delta` 字段
- `fastdeploy/model_executor/layers/rotary_embedding.py`：新增 3D -> 1D 转换逻辑
- `fastdeploy/model_executor/layers/attention/{append_attn_backend,flash_mask_attn_backend,ops/append_attention}.py`：在 attention 调用链中传入 `rope_3d_delta`
- `fastdeploy/worker/gpu_model_runner.py`、`fastdeploy/worker/input_batch.py`：调度阶段构建并维护 `rope_3d_delta`
- `fastdeploy/spec_decode/mtp.py`、`fastdeploy/engine/common_engine.py`：投机解码 / engine 路径同步适配
- `tests/layers/test_append_attention*.py`、`tests/operators/test_tree_mask.py`、`tests/deterministic/test_c16_warp1_4_determinism.py`：补齐 `rope_3d_delta` 入参

## Usage or Command
N/A（多模态模型内部根据 `rope_3d` 配置自动启用，无需用户侧改动）

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体实现质量良好，rope_3d_delta 从 Python 调度层（input_batch.py 分配、gpu_model_runner.py 计算）到 CUDA kernel 的端到端传递链路完整，decoder 路径的 guard 校验较为充分。建议后续迭代补齐 encoder 路径的 guard 校验以保持三条路径一致性。

support mm rope3d -> 1d

d427953

This comment was marked as outdated.

Sign in to view

fix error

bcaa76c

This comment was marked as outdated.

Sign in to view

TBD1 previously approved these changes May 29, 2026

View reviewed changes

xiaoxiaohehe001 dismissed TBD1’s stale review via 5e697f2 May 29, 2026 10:29

This comment was marked as outdated.

Sign in to view

fix ci

12b25e8

xiaoxiaohehe001 force-pushed the rope_3d_1d branch from 5e697f2 to 12b25e8 Compare May 29, 2026 10:45

This comment was marked as outdated.

Sign in to view

fix ci

5c642c6

PaddlePaddle-bot reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support multimodal RoPE 3D -> 1D in append_attention#7932

[Feature] Support multimodal RoPE 3D -> 1D in append_attention#7932
xiaoxiaohehe001 wants to merge 4 commits into
PaddlePaddle:release/online/20260415from
xiaoxiaohehe001:rope_3d_1d

xiaoxiaohehe001 commented May 26, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 26, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 26, 2026 •

edited

Loading

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 29, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xiaoxiaohehe001 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage

Checklist

Uh oh!

paddle-bot Bot commented May 26, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 6/7 通过

2.2 可选任务 — 13/13 通过

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

历史 Findings 修复情况

📝 PR 规范检查

总体评价

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xiaoxiaohehe001 commented May 26, 2026 •

edited

Loading

PaddlePaddle-bot commented May 26, 2026 •

edited

Loading

codecov-commenter commented May 29, 2026 •

edited

Loading