Fix score calculation and support neox rope for fleet-gqa-latent#7952
Fix score calculation and support neox rope for fleet-gqa-latent#7952chang-wenbin wants to merge 3 commits into
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7952 +/- ##
==========================================
Coverage ? 20.53%
==========================================
Files ? 467
Lines ? 65181
Branches ? 10007
==========================================
Hits ? 13383
Misses ? 51021
Partials ? 777
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览当前 Required 任务未全部通过:
2 任务状态汇总2.1 Required任务 : 5/10 通过
2.2 可选任务 — 27/31 通过
3 失败详情(仅 required)Run Four Cards Tests / run_4_cards_tests — PR问题(置信度: 高)Run Four Cards Tests / run_4_cards_tests
根因详情: 代码核查: 修复建议:
修复建议摘要: 补齐 ProposerInputBatch 初始化 xpu_4cards_case_test / run_xpu_4cards_cases — PR问题(置信度: 高)xpu_4cards_case_test / run_xpu_4cards_cases
根因详情: 补充说明: 修复建议:
修复建议摘要: 补齐 ProposerInputBatch 初始化 xpu_8cards_case_test / run_xpu_8cards_cases — PR问题(置信度: 高)xpu_8cards_case_test / run_xpu_8cards_cases
根因详情: 补充说明: 修复建议:
修复建议摘要: 补齐 ProposerInputBatch 初始化 Extracted partial CE model tasks to run in CI. / run_ce_cases — PR问题(置信度: 高)Extracted partial CE model tasks to run in CI. / run_ce_cases
根因详情: 修复建议:
修复建议摘要: 补齐 ProposerInputBatch 初始化 4 建议下一步
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-29 09:58:01
📋 Review 摘要
PR 概述:修复 MoE cutlass backend 中 learnable scaling 未生效的 bug,并为 input_batch 添加 neox rotary style 支持
变更范围:MoE layers、Worker input_batch
影响面 Tag:[OP] [Engine]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | input_batch.py:817 |
ProposerInputBatch 未设置 self.rotary_dim,使用时将抛出 AttributeError |
📝 PR 规范检查
标题缺少官方 Tag,描述各 section 内容为空(仅保留模板占位符)。
标题建议(可直接复制):
[BugFix] Fix score calculation and support neox rope for fleet-gqa-latent
PR 描述建议(点击展开,可直接复制)
## Motivation
修复 MoE cutlass backend 中 `routed_scaling_factor_learnable` 的 per-expert scale 未实际生效的问题(原代码在 `moe_expert_dispatch` 之前对 `topk_weights` 做 scale,但 dispatch kernel 会重新计算 topk_weights 导致 scale 被覆盖),同时为 input_batch 添加 neox rotary style(`rotary_percent`)支持。
## Modifications
- `fused_moe_cutlass_backend.py`:将 learnable per-expert scale 的应用从 `moe_expert_dispatch` 之前移至之后,确保 scale 作用于实际使用的 topk_weights
- `input_batch.py`:新增 `rotary_dim` 属性(基于 `rotary_percent * head_dim`),替换所有 `get_rope` 调用中硬编码的 `head_dim`
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
MoE score 修复逻辑正确,解决了 learnable scaling 未生效的实际 bug。但 ProposerInputBatch 子类缺少 rotary_dim 属性初始化,在投机解码场景下会导致运行时崩溃,需要修复。
| @@ -813,7 +817,7 @@ def init_share_inputs(self): | |||
| tmp_position_ids = paddle.arange(self.model_config.max_model_len).reshape((1, -1)) | |||
There was a problem hiding this comment.
🔴 Bug ProposerInputBatch 继承自 InputBatch 但其 __init__ 未调用 super().__init__(),也未独立设置 self.rotary_dim。当投机解码场景调用 ProposerInputBatch.init_share_inputs() 时,此处访问 self.rotary_dim 将抛出 AttributeError。
建议修复方式:在 ProposerInputBatch.__init__ 中补充 rotary_dim 的计算,与父类保持一致:
rotary_percent = getattr(self.model_config, "rotary_percent", 1)
self.rotary_dim = int(rotary_percent * self.model_config.head_dim)
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.