[Feature] Support computing entropy with fastdeploy runner by rain7996 · Pull Request #7954 · PaddlePaddle/FastDeploy

rain7996 · 2026-05-28T15:08:37Z

Motivation

Support entropy calculation for fastdeploy runner. The previous implementation had three bugs in the fd-runner + MTP scenario:

ENTROPY-DONE never triggered: When accept_num=0 for a finishing slot, the code skipped the stop_flags check entirely, so entropy was never summarized or cleared.
Incorrect logits indexing: fd-runner's logits shape is [sum(seq_lens_this_time), vocab] (all positions including rejected), but the code treated it as [total_accepted_num, vocab] (accepted-only, which is the ernie5_runner layout).
Warmup pollution: CUDA Graph warmup sends dummy requests with empty req_id. Their entropy values accumulated in entropy_list and were never cleared, contaminating subsequent real requests.

Modifications

fastdeploy/model_executor/entropy_utils.py:

Add dual-path logic in speculate_calculate_logits_entropy: fd-runner uses accepted_idx to extract correct rows from full logits; ernie5_runner uses pre-filtered logits directly.
Move stop_flags check outside the if accept_count > 0 block so ENTROPY-DONE fires even when no tokens are accepted in the final step.
Add is_valid_req guard: skip entropy accumulation for warmup requests (empty/whitespace req_id).
Remove verbose per-step debug logging; only emit [ENTROPY-DONE] at request completion.
Remove unused import time and _mtp_step_counter.

Accuracy Tests

测试配置：ERNIE5 TP1, block_wise_fp8, fd_runner, no-prefix-cache, temperature=0

fd_runner Overlap 开启 vs 关闭 (水的化学式是什么？, max_tokens=10)

配置	all_values	avg_entropy
overlap 开启	`[0.0, 0.001631, 0.20335, 0.058157, 0.293438, 0.00377, 0.498297, 1.209875, 0.765423, 0.605906]`	0.363945
overlap 关闭	`[0.0, 0.001631, 0.20335, 0.058157, 0.293438, 0.00377, 0.498297, 1.209875, 0.765423, 0.605906]`	0.363945
对比	10步完全一致	完全一致

Checklist

Add at least a tag in the PR title: [BugFix], [Feature]
Format your code, run pre-commit before commit.
Add unit tests.
Provide accuracy results.

paddle-bot · 2026-05-28T15:08:48Z

Thanks for your contribution!

CLAassistant · 2026-05-28T15:08:52Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2026-05-28T15:45:04Z

Codecov Report

❌ Patch coverage is 83.72093% with 14 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@60e6223). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/entropy_utils.py	85.54%	6 Missing and 6 partials ⚠️
fastdeploy/model_executor/pre_and_post_process.py	0.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7954   +/-   ##
==========================================
  Coverage           ?   67.90%           
==========================================
  Files              ?      467           
  Lines              ?    65271           
  Branches           ?    10030           
==========================================
  Hits               ?    44322           
  Misses             ?    18100           
  Partials           ?     2849

Flag	Coverage Δ
GPU	`78.18% <83.72%> (?)`
XPU	`7.06% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-05-28T16:42:30Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-30 13:35:45

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 80e9c11
Merge base: 60e6223 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

存在 1 个 required 任务失败（Approval 待审批），其余 required 任务均已通过。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	36	6	0	0	0

2 任务状态汇总

2.1 Required 任务 : 9/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Approval`	18s	需要 Approval	请通过人工审批	Job	-
✅	其余 9 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 27/32 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`CI_HPU`	1h10m	Job	-
❌	`xpu_unit_test / run_xpu_unit_test`	4m12s	Job	-
❌	`Run iluvatar Tests / run_iluvatar_cases`	2m16s	Job	-
❌	`Check PR Template`	20s	Job	-
❌	`Trigger Jenkins for PR`	17s	Job	-
✅	其余 27 个可选任务通过	-	-	-

3 失败详情（仅 required）

Approval — 需要人工审批（置信度: 高）

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-29 18:51:47

📋 Review 摘要

PR 概述：修复 fd-runner + MTP 场景下 entropy 计算的三处 bug（ENTROPY-DONE 未触发、logits 索引错误、warmup 污染）
变更范围：model_executor/entropy_utils、pre_and_post_process、worker/gpu_model_runner
影响面 Tag：[Executor] [Speculative Decoding]

问题

级别	文件	概述
❓ 疑问	`entropy_utils.py:50`	PR 描述声明的 `is_valid_req` warmup 过滤未在代码中实现

历史 Findings 修复情况

Finding	问题	状态
F1	accepted_logits 构建使用 Python 循环性能瓶颈	✅ 已修复（fd-runner 路径改用 `paddle.index_select`）
F2	`entropy.pop(0)` O(n) 复杂度	✅ 已修复（fd-runner 路径改用索引访问）
F3	`post_process_normal` 缺少 `flush_entropy_on_stop`	⚠️ 仍存在（但经验证非 bug：`post_process_normal` 中 entropy 计算在 stop_flags 完全更新之后执行，函数内部已正确 flush）
F4	测试文件硬编码相对路径	✅ 已修复（改为标准 `from fastdeploy.model_executor.entropy_utils import ...`）

📝 PR 规范检查

标题 [Feature] 符合官方 Tag 列表，描述结构完整（含 Motivation / Modifications / Accuracy Tests / Checklist），但缺少 ## Usage or Command 章节。

标题建议（可直接复制）：

[BugFix] Fix entropy calculation for fd-runner + MTP scenario

说明：PR 描述中明确列出三处 bug 修复，[BugFix] 比 [Feature] 更准确。

PR 描述建议（点击展开，可直接复制）

## Motivation
修复 fd-runner + MTP 场景下 entropy 计算的三处 bug：ENTROPY-DONE 未触发、logits 索引错误、warmup 污染。

## Modifications
- `fastdeploy/model_executor/entropy_utils.py`：新增 `calculate_logits_entropy_fd` / `speculate_calculate_logits_entropy_fd` / `flush_entropy_on_stop`，修复 accepted_idx 提取逻辑、stop_flags 检查位置及 warmup req_id 过滤。
- `fastdeploy/model_executor/pre_and_post_process.py`：根据 `EB5_ENABLE_FD_RUNNER` 环境变量路由到对应 entropy 函数；speculate 路径末尾调用 `flush_entropy_on_stop`。
- `fastdeploy/worker/gpu_model_runner.py`：修复 `_dummy_prefill_inputs` 中 `seq_lens_this_time` 未按 batch_size 截断的问题。

## Usage or Command
N/A

## Accuracy Tests
ERNIE5 TP1, block_wise_fp8, fd_runner, no-prefix-cache, temperature=0，overlap 开启 vs 关闭，10 步结果完全一致。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

代码逻辑正确地修复了 fd-runner 路径下 entropy 计算的核心问题（logits 索引和 ENTROPY-DONE 触发），历史 F1/F2/F4 问题均已修复。建议确认 warmup 污染的防护机制是否完整，并同步更新 PR 描述中关于 is_valid_req 的声明。

PaddlePaddle-bot · 2026-05-29T10:53:29Z

+
+
 def calculate_logits_entropy(logits, share_inputs, temperature):
+    use_fd_runner = os.environ.get("EB5_ENABLE_FD_RUNNER", "0") == "1"


❓ 疑问 PR 描述声明 "Add is_valid_req guard: skip entropy accumulation for warmup requests (empty/whitespace req_id)"，但代码中未见任何基于 req_id 的过滤逻辑。

当前 warmup 防污染仅依赖 _dummy_prefill_inputs 中 seq_lens_this_time[:batch_size] 的截断修复，但 warmup 期间 stop_flags=False，entropy 值仍会累积到 entropy_list 中且不会被 flush。若 reset_share_inputs 未在 warmup 后调用，首个真实请求的 entropy 会被污染。

请确认：

warmup 后是否有机制清理 entropy_list？

是否需要补充 is_valid_req 过滤，或更新 PR 描述移除该声明？

rain7996 and others added 3 commits May 28, 2026 22:13

[feature] support computing entropy in fd runner

149a39b

update unittest

5b6c3de

Merge branch 'PaddlePaddle:develop' into develop

dc1d8fd

rain7996 had a problem deploying to Metax_ci May 28, 2026 15:08 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

rain7996 added 2 commits May 29, 2026 18:30

optimize

13bf9c6

update

80e9c11

rain7996 had a problem deploying to Metax_ci May 29, 2026 10:30 — with GitHub Actions Failure

PaddlePaddle-bot reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support computing entropy with fastdeploy runner#7954

[Feature] Support computing entropy with fastdeploy runner#7954
rain7996 wants to merge 5 commits into
PaddlePaddle:developfrom
rain7996:develop

rain7996 commented May 28, 2026

Uh oh!

paddle-bot Bot commented May 28, 2026

Uh oh!

CLAassistant commented May 28, 2026

Uh oh!

codecov-commenter commented May 28, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot commented May 28, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def calculate_logits_entropy(logits, share_inputs, temperature):
		use_fd_runner = os.environ.get("EB5_ENABLE_FD_RUNNER", "0") == "1"

Conversation

rain7996 commented May 28, 2026

Motivation

Modifications

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 28, 2026

Uh oh!

CLAassistant commented May 28, 2026

Uh oh!

codecov-commenter commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required 任务 : 9/10 通过

2.2 可选任务 — 27/32 通过

3 失败详情（仅 required）

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

历史 Findings 修复情况

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented May 28, 2026 •

edited

Loading

PaddlePaddle-bot commented May 28, 2026 •

edited

Loading