[Feature] Support computing entropy with fastdeploy runner#7954
[Feature] Support computing entropy with fastdeploy runner#7954rain7996 wants to merge 5 commits into
Conversation
|
Thanks for your contribution! |
|
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7954 +/- ##
==========================================
Coverage ? 67.90%
==========================================
Files ? 467
Lines ? 65271
Branches ? 10030
==========================================
Hits ? 44322
Misses ? 18100
Partials ? 2849
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览存在 1 个 required 任务失败(Approval 待审批),其余 required 任务均已通过。
2 任务状态汇总2.1 Required 任务 : 9/10 通过
2.2 可选任务 — 27/32 通过
3 失败详情(仅 required)Approval — 需要人工审批(置信度: 高)该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。 |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-29 18:51:47
📋 Review 摘要
PR 概述:修复 fd-runner + MTP 场景下 entropy 计算的三处 bug(ENTROPY-DONE 未触发、logits 索引错误、warmup 污染)
变更范围:model_executor/entropy_utils、pre_and_post_process、worker/gpu_model_runner
影响面 Tag:[Executor] [Speculative Decoding]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| ❓ 疑问 | entropy_utils.py:50 |
PR 描述声明的 is_valid_req warmup 过滤未在代码中实现 |
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | accepted_logits 构建使用 Python 循环性能瓶颈 | ✅ 已修复(fd-runner 路径改用 paddle.index_select) |
| F2 | entropy.pop(0) O(n) 复杂度 |
✅ 已修复(fd-runner 路径改用索引访问) |
| F3 | post_process_normal 缺少 flush_entropy_on_stop |
post_process_normal 中 entropy 计算在 stop_flags 完全更新之后执行,函数内部已正确 flush) |
| F4 | 测试文件硬编码相对路径 | ✅ 已修复(改为标准 from fastdeploy.model_executor.entropy_utils import ...) |
📝 PR 规范检查
标题 [Feature] 符合官方 Tag 列表,描述结构完整(含 Motivation / Modifications / Accuracy Tests / Checklist),但缺少 ## Usage or Command 章节。
标题建议(可直接复制):
[BugFix] Fix entropy calculation for fd-runner + MTP scenario
说明:PR 描述中明确列出三处 bug 修复,
[BugFix]比[Feature]更准确。
PR 描述建议(点击展开,可直接复制)
## Motivation
修复 fd-runner + MTP 场景下 entropy 计算的三处 bug:ENTROPY-DONE 未触发、logits 索引错误、warmup 污染。
## Modifications
- `fastdeploy/model_executor/entropy_utils.py`:新增 `calculate_logits_entropy_fd` / `speculate_calculate_logits_entropy_fd` / `flush_entropy_on_stop`,修复 accepted_idx 提取逻辑、stop_flags 检查位置及 warmup req_id 过滤。
- `fastdeploy/model_executor/pre_and_post_process.py`:根据 `EB5_ENABLE_FD_RUNNER` 环境变量路由到对应 entropy 函数;speculate 路径末尾调用 `flush_entropy_on_stop`。
- `fastdeploy/worker/gpu_model_runner.py`:修复 `_dummy_prefill_inputs` 中 `seq_lens_this_time` 未按 batch_size 截断的问题。
## Usage or Command
N/A
## Accuracy Tests
ERNIE5 TP1, block_wise_fp8, fd_runner, no-prefix-cache, temperature=0,overlap 开启 vs 关闭,10 步结果完全一致。
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
代码逻辑正确地修复了 fd-runner 路径下 entropy 计算的核心问题(logits 索引和 ENTROPY-DONE 触发),历史 F1/F2/F4 问题均已修复。建议确认 warmup 污染的防护机制是否完整,并同步更新 PR 描述中关于 is_valid_req 的声明。
|
|
||
|
|
||
| def calculate_logits_entropy(logits, share_inputs, temperature): | ||
| use_fd_runner = os.environ.get("EB5_ENABLE_FD_RUNNER", "0") == "1" |
There was a problem hiding this comment.
❓ 疑问 PR 描述声明 "Add is_valid_req guard: skip entropy accumulation for warmup requests (empty/whitespace req_id)",但代码中未见任何基于 req_id 的过滤逻辑。
当前 warmup 防污染仅依赖 _dummy_prefill_inputs 中 seq_lens_this_time[:batch_size] 的截断修复,但 warmup 期间 stop_flags=False,entropy 值仍会累积到 entropy_list 中且不会被 flush。若 reset_share_inputs 未在 warmup 后调用,首个真实请求的 entropy 会被污染。
请确认:
- warmup 后是否有机制清理
entropy_list? - 是否需要补充
is_valid_req过滤,或更新 PR 描述移除该声明?
Motivation
Support entropy calculation for fastdeploy runner. The previous implementation had three bugs in the fd-runner + MTP scenario:
accept_num=0for a finishing slot, the code skipped thestop_flagscheck entirely, so entropy was never summarized or cleared.[sum(seq_lens_this_time), vocab](all positions including rejected), but the code treated it as[total_accepted_num, vocab](accepted-only, which is the ernie5_runner layout).req_id. Their entropy values accumulated inentropy_listand were never cleared, contaminating subsequent real requests.Modifications
fastdeploy/model_executor/entropy_utils.py:speculate_calculate_logits_entropy: fd-runner usesaccepted_idxto extract correct rows from full logits; ernie5_runner uses pre-filtered logits directly.stop_flagscheck outside theif accept_count > 0block so ENTROPY-DONE fires even when no tokens are accepted in the final step.is_valid_reqguard: skip entropy accumulation for warmup requests (empty/whitespacereq_id).[ENTROPY-DONE]at request completion.import timeand_mtp_step_counter.Accuracy Tests
测试配置:ERNIE5 TP1, block_wise_fp8, fd_runner, no-prefix-cache, temperature=0
fd_runner Overlap 开启 vs 关闭 (水的化学式是什么?, max_tokens=10)
[0.0, 0.001631, 0.20335, 0.058157, 0.293438, 0.00377, 0.498297, 1.209875, 0.765423, 0.605906][0.0, 0.001631, 0.20335, 0.058157, 0.293438, 0.00377, 0.498297, 1.209875, 0.765423, 0.605906]Checklist
[BugFix],[Feature]pre-commitbefore commit.