delete useless code#7959
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览存在 1 个 required 失败任务,需优先处理后方可合并。
2 任务状态汇总2.1 Required任务 : 9/10 通过
2.2 可选任务 — 28/32 通过
3 失败详情(仅 required)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 测试失败(置信度: 中)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
失败用例:
根因详情: 失败用例覆盖 关键日志: 修复建议:
修复建议摘要: 恢复 decoder/speculate 写 cache 分支 关联变更: 链接: 查看日志 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7959 +/- ##
==========================================
Coverage ? 67.52%
==========================================
Files ? 467
Lines ? 65179
Branches ? 10007
==========================================
Hits ? 44013
Misses ? 18349
Partials ? 2817
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-29 13:51:55
📋 Review 摘要
PR 概述:删除冗余的 Decoder/Speculate KV 写缓存代码路径,将 encoder/decoder 写缓存统一到 EncoderWriteCacheWithRopeKernel 并无条件执行
变更范围:custom_ops/gpu_ops/append_attention.cu、custom_ops/gpu_ops/append_attn/get_block_shape_and_split_kv_block.cu
影响面 Tag:[OP] [KVCache]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| ❓ 疑问 | append_attention.cu |
EncoderWriteCacheWithRopeKernel 无条件调用且第 4 参数改为 seq_lens_this_time,纯 Decode 批次下行为需确认 |
| ❓ 疑问 | get_block_shape_and_split_kv_block.cu |
split_kv_block 提升为无条件执行且切换为 seq_lens_this_time,纯 Decode 批次下 kv_batch_ids 填充正确性需确认 |
疑问 1 — append_attention.cu
EncoderWriteCacheWithRopeKernel 现在在 if (max_enc_len_this_time > 0) 之外被无条件调用,同时第 4 个参数(原 seq_lens_encoder)改为 seq_lens_this_time:
// 新代码(两个位置均传 seq_lens_this_time)
EncoderWriteCacheWithRopeKernel<...>(
meta_data, qkv,
seq_lens_this_time, // param3: seq_lens_this_time ✓
seq_lens_this_time, // param4: 原 seq_lens_encoder ← 已变更
seq_lens_decoder, ...);函数内部(encoder_write_cache_with_rope_kernel.h)将此参数传给 rotary_qk_variable 以及 CascadeAppendWriteCacheKVQKV,后者的底层 cache_kernel 用它做跳过判断:
if (seq_lens[ori_bi] == 0) continue; // 原语义: 无 encoder token 则跳过在纯 Decode 批次(seq_lens_encoder[i] == 0,但 seq_lens_this_time[i] = 1),旧代码因 if (max_enc_len_this_time > 0) 不进入此 kernel;新代码将进入且 seq_lens[ori_bi] = 1 ≠ 0,不会跳过。同时 DecoderWriteCacheWithRoPEKernel 已被删除,请确认 EncoderWriteCacheWithRopeKernel 能完整替代其功能(尤其 ori_seq_id 计算是否与 decoder 场景一致)。
疑问 2 — get_block_shape_and_split_kv_block.cu
split_kv_block 第 2 参数改为 seq_lens_this_time,且整个 KV block 切分逻辑由 if (max_enc_len_this_time > 0) 内提升为无条件执行。请确认:
- 对纯 Decode 批次(全部请求均为 1 token decoder),
split_kv_block填充后的kv_batch_ids/kv_tile_ids_per_batch是否会被EncoderWriteCacheWithRopeKernel内的量化写缓存路径(CascadeAppendWriteCacheKVC8QKV/CascadeAppendWriteCacheKVC4QKV)正确消费。
📝 PR 规范检查
标题缺少官方 Tag,且 Motivation / Modifications / Usage or Command / Accuracy Tests 各段均为空,Checklist 全部未勾选。
标题建议(可直接复制):
[OP] Delete useless decoder and speculate write cache kernel calls
PR 描述建议(点击展开,可直接复制)
## Motivation
将 encoder / decoder / speculate 三条 KV 写缓存路径统一,删除冗余的 `SpeculateWriteCacheWithRoPEKernel` 和 `DecoderWriteCacheWithRoPEKernel` 调用,简化 `AppendAttentionKernel` 逻辑。同时在 `GetBlockShapeAndSplitKVBlock` 中将 KV block 切分无条件执行并改用 `seq_lens_this_time`,与上层统一路径对齐。
## Modifications
- `custom_ops/gpu_ops/append_attention.cu`:将 `EncoderWriteCacheWithRopeKernel` 调用提升到 `if (max_enc_len_this_time > 0)` 之外,无条件执行;第 4 参数改为 `seq_lens_this_time`;删除 `SpeculateWriteCacheWithRoPEKernel` 和 `DecoderWriteCacheWithRoPEKernel` 共 4 个调用点
- `custom_ops/gpu_ops/append_attn/get_block_shape_and_split_kv_block.cu`:将 KV block 切分(`split_kv_block`)从 encoder-only 条件块改为无条件执行;参数改为 `seq_lens_this_time`;为 `kv_num_blocks_x_cpu.copy_` 添加 CUDAGraph capture 保护宏
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
PR 变更思路清晰——将三条 KV 写缓存路径合一是合理重构,代码量大幅减少。但统一后 EncoderWriteCacheWithRopeKernel 在纯 Decode 批次下的语义正确性(第 4 参数语义替换 + 无条件调用)建议在 PR 描述中给出说明或补充测试覆盖,以便 Reviewer 确认功能等价。
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.