Skip to content

[Feature] Support multimodal RoPE 3D -> 1D in append_attention#7932

Open
xiaoxiaohehe001 wants to merge 4 commits into
PaddlePaddle:release/online/20260415from
xiaoxiaohehe001:rope_3d_1d
Open

[Feature] Support multimodal RoPE 3D -> 1D in append_attention#7932
xiaoxiaohehe001 wants to merge 4 commits into
PaddlePaddle:release/online/20260415from
xiaoxiaohehe001:rope_3d_1d

Conversation

@xiaoxiaohehe001
Copy link
Copy Markdown
Collaborator

@xiaoxiaohehe001 xiaoxiaohehe001 commented May 26, 2026

Motivation

支持多模态场景下的 RoPE 3D -> 1D 转换,使 append_attention 系列 kernel 能够正确处理多模态输入的旋转位置编码。

Modifications

Kernel 侧 (custom_ops/gpu_ops/append_attn)

  • decoder_write_cache_with_rope / encoder_write_cache_with_rope / speculate_write_cache_with_rope:新增对 RoPE 3D -> 1D 的处理逻辑,kernel 入参增加 rope_3d_position_ids / rope_3d_delta 相关参数。
  • append_attention.cucpp_extensions.cc:透传新参数到 Python 端。
  • 模板实例化文件同步更新。

Python 侧 (fastdeploy)

  • model_executor/forward_meta.pyForwardMeta 中新增 rope_3d_delta 字段。
  • model_executor/layers/rotary_embedding.py:新增 3D -> 1D 转换逻辑。
  • model_executor/layers/attention/{append_attn_backend, flash_mask_attn_backend, ops/append_attention}.py:在 attention 调用链中传入 rope_3d_delta
  • worker/gpu_model_runner.pyworker/input_batch.py:在调度阶段构建并维护 rope_3d_delta
  • spec_decode/mtp.pyengine/common_engine.py:投机解码 / engine 路径同步适配。

测试

  • tests/layers/test_append_attention*.pytests/operators/test_tree_mask.pytests/deterministic/test_c16_warp1_4_determinism.py:新增/补齐 rope_3d_delta 相关入参,保证已有测试通过。

Usage

无需用户侧改动。多模态模型在内部根据 rope_3d 配置自动启用 3D -> 1D 转换路径。

Checklist

  • 已通过本地编译
  • 已通过相关单测
  • 文档/注释已更新(如适用)

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 26, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 26, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-29 23:44:13

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

存在 1 个 Required 任务失败,阻塞合并,需优先处理。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
20(0) 20 19 1 0 0 0

2 任务状态汇总

2.1 Required任务 : 6/7 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
run_tests_with_coverage 1h16m PR问题:差异行覆盖率 41%,未达 80% 阈值 为 rotary_embedding.py、mtp.py 新增单测 Job -
其余 6 个必选任务通过 - - - - -

2.2 可选任务 — 13/13 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
全部 13 个可选任务通过 - - -

3 失败详情(仅 required)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率不足(置信度: 高)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 覆盖率不足
  • 置信度: 高
  • 根因摘要: 差异行覆盖率 41%,未达 80% 阈值,多文件新增代码缺少单测
  • 分析器: ci_analyze_unittest_fastdeploy

未覆盖文件详情:

文件 覆盖率 未覆盖行
fastdeploy/model_executor/layers/rotary_embedding.py 8.3% L653, L656, L660, L665-L672
fastdeploy/spec_decode/mtp.py 0% L544-L546, L548, L552
fastdeploy/worker/gpu_model_runner.py 64.7% L218, L808-L809, L811-L812, L3192
fastdeploy/engine/common_engine.py 50% L983
fastdeploy/worker/input_batch.py 50% L372, L502-L503, L505, L707-L708, L710

根因详情:
本次 PR 新增了多模态 RoPE 3D→1D 支持,rotary_embedding.py L653-L672 新增代码块覆盖率仅 8.3%,mtp.py L544-L552 新增代码覆盖率为 0%。所有单元测试均通过(TEST_EXIT_CODE=0),但差异行整体覆盖率仅 41%(共 51 个变更行,30 行未被覆盖),远低于 80% 的合并阈值(COVERAGE_EXIT_CODE=9)。

关键日志:

COVERAGE_EXIT_CODE: 9
total_percent_covered: 41%  (need ≥ 80%)
rotary_embedding.py : 8.3%  covered (violations: L653,656,660,665-672)
mtp.py              : 0%    covered (violations: L544,545,546,548,552)
gpu_model_runner.py : 64.7% covered (violations: L218,808,809,811,812,3192)
input_batch.py      : 50%   covered (violations: L372,502,503,505,707,708,710)
common_engine.py    : 50%   covered (violations: L983)

修复建议:

  1. tests/ 中为 fastdeploy/model_executor/layers/rotary_embedding.py L653-L672 的新逻辑(multimodal 3D→1D RoPE)添加单元测试
  2. fastdeploy/spec_decode/mtp.py L544-L552 的新增代码添加测试覆盖
  3. 若上述代码路径属于 GPU 专用或难以在 CI 环境中执行的逻辑,可向团队申请覆盖率豁免

修复建议摘要: 为 rotary_embedding.py L653-672、mtp.py L544-552 新增单测

关联变更: fastdeploy/model_executor/layers/rotary_embedding.py, fastdeploy/spec_decode/mtp.py, fastdeploy/worker/gpu_model_runner.py, fastdeploy/worker/input_batch.py, fastdeploy/engine/common_engine.py

链接: 查看日志

PaddlePaddle-bot

This comment was marked as outdated.

TBD1
TBD1 previously approved these changes May 29, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 29, 2026

Codecov Report

❌ Patch coverage is 33.33333% with 34 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/online/20260415@0d7fccd). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...stdeploy/model_executor/layers/rotary_embedding.py 8.33% 11 Missing ⚠️
fastdeploy/worker/gpu_model_runner.py 52.94% 6 Missing and 2 partials ⚠️
fastdeploy/worker/input_batch.py 42.85% 7 Missing and 1 partial ⚠️
fastdeploy/spec_decode/mtp.py 0.00% 5 Missing ⚠️
fastdeploy/engine/common_engine.py 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@                    Coverage Diff                     @@
##             release/online/20260415    #7932   +/-   ##
==========================================================
  Coverage                           ?   72.31%           
==========================================================
  Files                              ?      388           
  Lines                              ?    54137           
  Branches                           ?     8490           
==========================================================
  Hits                               ?    39148           
  Misses                             ?    12284           
  Partials                           ?     2705           
Flag Coverage Δ
GPU 72.31% <33.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-29 20:46:57

📋 Review 摘要

PR 概述:支持多模态场景下的 RoPE 3D -> 1D 转换,通过新增 rope_3d_delta 参数在 decode/encode/speculate 三条路径中实现基于批次的位置偏移。
变更范围custom_ops/gpu_ops/append_attn/fastdeploy/model_executor/layers/attention/fastdeploy/worker/fastdeploy/spec_decode/
影响面 Tag[OP] [Speculative Decoding] [Engine]

问题

未发现阻塞性问题。历史 Findings 仍有部分未修复,详见下方。

历史 Findings 修复情况

Finding 问题 状态
F1 Encoder 路径缺少 rope_3d_delta + use_neox_style 组合校验 ⚠️ 仍存在
F2 # Fix 注释过于模糊 ⚠️ 仍存在
F3 encoder/speculate 路径中 rope_3d_delta 的 guard 逻辑与 decoder 不一致 🔄 部分修复

F3 说明:speculate 路径已新增 rope_3d_delta 的 NeoX 和 cache_quant_type 校验(speculate_write_cache_with_rope_kernel.cu:574-585),但 encoder 路径(encoder_write_cache_with_rope_kernel.h)仍未添加对应 guard。此外 speculate 路径的 if/else 结构(if rope_3d_delta ... else if rope_3d ... else ...)与 decoder 路径的嵌套结构(if rope_3d { if rope_3d_delta ... else ... })存在逻辑语义差异——speculate 会在 rope_3d=false 时仍应用 delta(实际运行中不会触发,因为 Python 侧保证了 rope_3d_delta 非空时 rope_3d 必为 True)。

📝 PR 规范检查

目标分支为 release/online/20260415(非 develop),标题缺少 [Cherry-Pick] 前缀;PR 描述使用 ### 而非 ## 作为 section 标题,且缺少 ## Accuracy Tests 段落,Checklist 条目与标准模板不符。

标题建议(可直接复制):

  • [Cherry-Pick][Feature] Support multimodal RoPE 3D -> 1D in append_attention(#<原PR号>)
PR 描述建议(点击展开,可直接复制)
## Motivation
支持多模态场景下的 RoPE 3D -> 1D 转换,使 append_attention 系列 kernel 能够正确处理多模态输入的旋转位置编码。通过新增 `rope_3d_delta` 参数,在 decode/encode/speculate 三条路径中实现基于批次的位置偏移,替代原有的 `ori_bi * max_seq_len * head_size` 固定偏移方式。

## Modifications
- `custom_ops/gpu_ops/append_attn/decoder_write_cache_with_rope_impl.cuh` / `_kernel.cu` / `_kernel.h`:新增 `rope_3d_delta` 参数,kernel 内部根据 `rope_3d_delta` 计算 `rope_pos = write_seq_id + rope_3d_delta[bid]`
- `custom_ops/gpu_ops/append_attn/encoder_write_cache_with_rope_impl.cuh` / `_kernel.h`:同步新增 `rope_3d_delta` 参数传递
- `custom_ops/gpu_ops/append_attn/speculate_write_cache_with_rope_impl.cuh` / `_kernel.cu` / `_kernel.h`:同步新增 `rope_3d_delta` 参数传递
- `custom_ops/gpu_ops/append_attention.cu``cpp_extensions.cc`:在 `AppendAttention` / `AppendAttentionWithOutput` 接口中透传 `rope_3d_delta`
- `fastdeploy/model_executor/forward_meta.py``ForwardMeta` 新增 `rope_3d_delta` 字段
- `fastdeploy/model_executor/layers/rotary_embedding.py`:新增 3D -> 1D 转换逻辑
- `fastdeploy/model_executor/layers/attention/{append_attn_backend,flash_mask_attn_backend,ops/append_attention}.py`:在 attention 调用链中传入 `rope_3d_delta`
- `fastdeploy/worker/gpu_model_runner.py``fastdeploy/worker/input_batch.py`:调度阶段构建并维护 `rope_3d_delta`
- `fastdeploy/spec_decode/mtp.py``fastdeploy/engine/common_engine.py`:投机解码 / engine 路径同步适配
- `tests/layers/test_append_attention*.py``tests/operators/test_tree_mask.py``tests/deterministic/test_c16_warp1_4_determinism.py`:补齐 `rope_3d_delta` 入参

## Usage or Command
N/A(多模态模型内部根据 `rope_3d` 配置自动启用,无需用户侧改动)

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

整体实现质量良好,rope_3d_delta 从 Python 调度层(input_batch.py 分配、gpu_model_runner.py 计算)到 CUDA kernel 的端到端传递链路完整,decoder 路径的 guard 校验较为充分。建议后续迭代补齐 encoder 路径的 guard 校验以保持三条路径一致性。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants