Skip to content

[Feature] Support computing entropy with fastdeploy runner#7954

Open
rain7996 wants to merge 5 commits into
PaddlePaddle:developfrom
rain7996:develop
Open

[Feature] Support computing entropy with fastdeploy runner#7954
rain7996 wants to merge 5 commits into
PaddlePaddle:developfrom
rain7996:develop

Conversation

@rain7996
Copy link
Copy Markdown
Contributor

Motivation

Support entropy calculation for fastdeploy runner. The previous implementation had three bugs in the fd-runner + MTP scenario:

  1. ENTROPY-DONE never triggered: When accept_num=0 for a finishing slot, the code skipped the stop_flags check entirely, so entropy was never summarized or cleared.
  2. Incorrect logits indexing: fd-runner's logits shape is [sum(seq_lens_this_time), vocab] (all positions including rejected), but the code treated it as [total_accepted_num, vocab] (accepted-only, which is the ernie5_runner layout).
  3. Warmup pollution: CUDA Graph warmup sends dummy requests with empty req_id. Their entropy values accumulated in entropy_list and were never cleared, contaminating subsequent real requests.

Modifications

fastdeploy/model_executor/entropy_utils.py:

  • Add dual-path logic in speculate_calculate_logits_entropy: fd-runner uses accepted_idx to extract correct rows from full logits; ernie5_runner uses pre-filtered logits directly.
  • Move stop_flags check outside the if accept_count > 0 block so ENTROPY-DONE fires even when no tokens are accepted in the final step.
  • Add is_valid_req guard: skip entropy accumulation for warmup requests (empty/whitespace req_id).
  • Remove verbose per-step debug logging; only emit [ENTROPY-DONE] at request completion.
  • Remove unused import time and _mtp_step_counter.

Accuracy Tests

测试配置:ERNIE5 TP1, block_wise_fp8, fd_runner, no-prefix-cache, temperature=0

fd_runner Overlap 开启 vs 关闭 (水的化学式是什么?, max_tokens=10)

配置 all_values avg_entropy
overlap 开启 [0.0, 0.001631, 0.20335, 0.058157, 0.293438, 0.00377, 0.498297, 1.209875, 0.765423, 0.605906] 0.363945
overlap 关闭 [0.0, 0.001631, 0.20335, 0.058157, 0.293438, 0.00377, 0.498297, 1.209875, 0.765423, 0.605906] 0.363945
对比 10步完全一致 完全一致

Checklist

  • Add at least a tag in the PR title: [BugFix], [Feature]
  • Format your code, run pre-commit before commit.
  • Add unit tests.
  • Provide accuracy results.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 28, 2026

Thanks for your contribution!

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 28, 2026

Codecov Report

❌ Patch coverage is 83.72093% with 14 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@60e6223). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/model_executor/entropy_utils.py 85.54% 6 Missing and 6 partials ⚠️
fastdeploy/model_executor/pre_and_post_process.py 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7954   +/-   ##
==========================================
  Coverage           ?   67.90%           
==========================================
  Files              ?      467           
  Lines              ?    65271           
  Branches           ?    10030           
==========================================
  Hits               ?    44322           
  Misses             ?    18100           
  Partials           ?     2849           
Flag Coverage Δ
GPU 78.18% <83.72%> (?)
XPU 7.06% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 28, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-30 13:35:45

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

存在 1 个 required 任务失败(Approval 待审批),其余 required 任务均已通过。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 36 6 0 0 0

2 任务状态汇总

2.1 Required 任务 : 9/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 18s 需要 Approval 请通过人工审批 Job -
其余 9 个必选任务通过 - - - - -

2.2 可选任务 — 27/32 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
CI_HPU 1h10m Job -
xpu_unit_test / run_xpu_unit_test 4m12s Job -
Run iluvatar Tests / run_iluvatar_cases 2m16s Job -
Check PR Template 20s Job -
Trigger Jenkins for PR 17s Job -
其余 27 个可选任务通过 - - -

3 失败详情(仅 required)

Approval — 需要人工审批(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-29 18:51:47

📋 Review 摘要

PR 概述:修复 fd-runner + MTP 场景下 entropy 计算的三处 bug(ENTROPY-DONE 未触发、logits 索引错误、warmup 污染)
变更范围:model_executor/entropy_utils、pre_and_post_process、worker/gpu_model_runner
影响面 Tag[Executor] [Speculative Decoding]

问题

级别 文件 概述
❓ 疑问 entropy_utils.py:50 PR 描述声明的 is_valid_req warmup 过滤未在代码中实现

历史 Findings 修复情况

Finding 问题 状态
F1 accepted_logits 构建使用 Python 循环性能瓶颈 ✅ 已修复(fd-runner 路径改用 paddle.index_select
F2 entropy.pop(0) O(n) 复杂度 ✅ 已修复(fd-runner 路径改用索引访问)
F3 post_process_normal 缺少 flush_entropy_on_stop ⚠️ 仍存在(但经验证非 bug:post_process_normal 中 entropy 计算在 stop_flags 完全更新之后执行,函数内部已正确 flush)
F4 测试文件硬编码相对路径 ✅ 已修复(改为标准 from fastdeploy.model_executor.entropy_utils import ...

📝 PR 规范检查

标题 [Feature] 符合官方 Tag 列表,描述结构完整(含 Motivation / Modifications / Accuracy Tests / Checklist),但缺少 ## Usage or Command 章节。

标题建议(可直接复制):

  • [BugFix] Fix entropy calculation for fd-runner + MTP scenario

说明:PR 描述中明确列出三处 bug 修复,[BugFix][Feature] 更准确。

PR 描述建议(点击展开,可直接复制)
## Motivation
修复 fd-runner + MTP 场景下 entropy 计算的三处 bug:ENTROPY-DONE 未触发、logits 索引错误、warmup 污染。

## Modifications
- `fastdeploy/model_executor/entropy_utils.py`:新增 `calculate_logits_entropy_fd` / `speculate_calculate_logits_entropy_fd` / `flush_entropy_on_stop`,修复 accepted_idx 提取逻辑、stop_flags 检查位置及 warmup req_id 过滤。
- `fastdeploy/model_executor/pre_and_post_process.py`:根据 `EB5_ENABLE_FD_RUNNER` 环境变量路由到对应 entropy 函数;speculate 路径末尾调用 `flush_entropy_on_stop`- `fastdeploy/worker/gpu_model_runner.py`:修复 `_dummy_prefill_inputs``seq_lens_this_time` 未按 batch_size 截断的问题。

## Usage or Command
N/A

## Accuracy Tests
ERNIE5 TP1, block_wise_fp8, fd_runner, no-prefix-cache, temperature=0,overlap 开启 vs 关闭,10 步结果完全一致。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

代码逻辑正确地修复了 fd-runner 路径下 entropy 计算的核心问题(logits 索引和 ENTROPY-DONE 触发),历史 F1/F2/F4 问题均已修复。建议确认 warmup 污染的防护机制是否完整,并同步更新 PR 描述中关于 is_valid_req 的声明。



def calculate_logits_entropy(logits, share_inputs, temperature):
use_fd_runner = os.environ.get("EB5_ENABLE_FD_RUNNER", "0") == "1"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 PR 描述声明 "Add is_valid_req guard: skip entropy accumulation for warmup requests (empty/whitespace req_id)",但代码中未见任何基于 req_id 的过滤逻辑。

当前 warmup 防污染仅依赖 _dummy_prefill_inputsseq_lens_this_time[:batch_size] 的截断修复,但 warmup 期间 stop_flags=False,entropy 值仍会累积到 entropy_list 中且不会被 flush。若 reset_share_inputs 未在 warmup 后调用,首个真实请求的 entropy 会被污染。

请确认:

  1. warmup 后是否有机制清理 entropy_list
  2. 是否需要补充 is_valid_req 过滤,或更新 PR 描述移除该声明?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants