[Scheduler] Simplify scheduler for prefill instances by liyonghua0910 · Pull Request #7944 · PaddlePaddle/FastDeploy

liyonghua0910 · 2026-05-27T12:16:32Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Prefill instances shared the same schedule() with Decode instances, introducing unnecessary waiting-queue and block-allocation overhead. This PR adds a dedicated prefill_schedule() to simplify the Prefill scheduling path.

Modifications

Add prefill_schedule() that skips the waiting queue and directly schedules from running requests with full token budget.
Dispatch to prefill_schedule() when splitwise_role == "prefill".
Fix layer0 signal race in cache_messager.py via pending_layer0_signals buffering.
Support NaiveProposer; fix CUDAGraph replay OOB access (CUDA error 700); fix top_p_list sync and buffer sizing.

Usage or Command

No configuration change needed. Prefill instances automatically use the new path.

Accuracy Tests

No model computation changed. Accuracy unaffected.

Performance

GLM-4.5-Air, TP8, 1P3D, 39 concurrency, 78 multi-round chat sessions:

Metric	Before	After	Δ
Total time	7912s	6778s	-14.3%
Avg TTFT	29.4s	23.6s	-19.8%
P90 TTFT	121.3s	91.7s	-24.4%

End-to-end throughput +16.7%.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-27T12:16:39Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-27 20:33:56

📋 Review 摘要

PR 概述：为 splitwise prefill 实例新增专用 prefill_schedule() 调度方法，跳过 decode 路径，简化 P 实例调度逻辑。
变更范围：fastdeploy/engine/common_engine.py、fastdeploy/engine/sched/resource_manager_v1.py
影响面 Tag：[Scheduler] [Engine] [PD Disaggregation]

问题

级别	文件	概述
🟡 建议	`resource_manager_v1.py:1257`	`assert` 用于运行时状态检查，Python `-O` 下静默失效
🟡 建议	`common_engine.py:1122`	`prefill_schedule()` 仅在 v1 实现，其他 resource manager 实现调用时会 AttributeError

📝 PR 规范检查

PR 标题格式合规（[Scheduler] 为官方 Tag）✓。但描述各 Section 均为空（仅含模板占位注释），Checklist 全部未勾选，需补全。

标题建议（可直接复制）：

[Scheduler] Simplify scheduler for prefill instances

PR 描述建议（点击展开，可直接复制）

## Motivation
当前 prefill 实例（`splitwise_role == "prefill"`）复用通用 `schedule()` 方法，该方法包含 decode 分配逻辑，对纯 prefill 实例存在不必要的复杂度。本 PR 新增专用 `prefill_schedule()` 方法，移除无关的 decode 调度路径，以 `max_num_batched_tokens` 为全量 token 预算，仅处理 running 队列中的 prefill 任务。

## Modifications
- `fastdeploy/engine/common_engine.py`：`_fetch_request()` 中当 `splitwise_role == "prefill"` 时调用 `resource_manager.prefill_schedule()`，否则走原有 `schedule()`。
- `fastdeploy/engine/sched/resource_manager_v1.py`：新增 `prefill_schedule()` 方法，跳过 decode 请求，按 token 预算逐一调度 prefill 任务，并更新 `num_computed_tokens`。

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

逻辑思路清晰，为 prefill 实例剥离专用调度路径是合理设计。但 assert 的安全性和多实现同步两点需关注，建议修复后合入。

PaddlePaddle-bot · 2026-05-27T12:36:52Z

+            batch_request = BatchRequest()
+            token_budget = self.config.scheduler_config.max_num_batched_tokens
+
+            assert len(self.waiting) == 0, "Prefill scheduler should not have waiting requests"


🟡 建议 assert 用于运行时状态不变量检查

Python 以 -O（优化模式）运行时，assert 语句会被完全跳过。若 self.waiting 非空，该断言失效，后续调度将静默跳过所有 waiting 请求，造成请求饥饿且无任何日志警告。

建议改为显式异常或日志警告：

if len(self.waiting) != 0: llm_logger.warning( f"prefill_schedule: unexpected {len(self.waiting)} waiting requests, skipping them." )

或如果这是不可恢复的状态：

if len(self.waiting) != 0: raise RuntimeError( f"Prefill scheduler should not have waiting requests, got {len(self.waiting)}" )

PaddlePaddle-bot · 2026-05-27T12:36:52Z

                    self.resource_manager.scheduler_unhandled_request_num = self._get_scheduler_unhandled_request_num()
                # 2. Schedule requests
-                batch_request, error_tasks = self.resource_manager.schedule()
+                if self.cfg.scheduler_config.splitwise_role == "prefill":


🟡 建议 prefill_schedule() 仅在 resource_manager_v1 中实现，其他实现调用时会抛 AttributeError

common_engine.py 对所有 splitwise_role == "prefill" 的实例无条件调用 self.resource_manager.prefill_schedule()，但该方法目前只在 resource_manager_v1.py 中新增。若项目存在其他 resource manager 实现（如 v2 或未来新增版本），以 prefill 角色启动时会在此处崩溃。

注意到代码中已有类似的防御写法（hasattr 检查 scheduler_unhandled_request_num）。建议同步到其他 resource manager 实现，或添加防御检查：

if self.cfg.scheduler_config.splitwise_role == "prefill": if hasattr(self.resource_manager, "prefill_schedule"): batch_request, error_tasks = self.resource_manager.prefill_schedule() else: batch_request, error_tasks = self.resource_manager.schedule() else: batch_request, error_tasks = self.resource_manager.schedule()

PaddlePaddle-bot · 2026-05-27T12:41:12Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-27 23:53:03

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: ca4a27b
Merge base: d0a9661 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

当前 Required 任务 8/10 通过，仍有 2 个 Required 失败：Approval 需要人工审批；主测试任务命中既有分析缓存，失败更像 R3 routing replay 本地文件写读竞态/偶发问题，建议优先审批并 rerun。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
41(0)	41	35	6	0	0	0

2 任务状态汇总

2.1 Required任务 : 8/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	1h23m	不稳定问题：R3 路由文件写读竞态	请 rerun；长期加可读校验/原子写	Job	-
❌	`Approval`	19s	需要 Approval	请通过人工审批	Job	-
✅	其余 8 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 27/31 通过

可选任务不阻塞合并，失败仅供参考，不做深度分析。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	1m55s	Job	-
❌	`Check PR Template`	23s	Job	-
❌	`CI_HPU`	1h21m	Job	-
❌	`Trigger Jenkins for PR`	21s	Job	-
✅	其余 27 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 不稳定问题/测试失败（置信度: 中）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

状态: ❌ 失败
错误类型: 测试失败
置信度: 中
根因摘要: R3 路由文件写读竞态
分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试	错误	根因
`tests/e2e/test_EB_Lite_serving_R3.py::test_r3_accuracy`	`EOFError: Ran out of input`	测试在 `layer_0.pdtensor` 文件刚出现时即 `paddle.load`，疑似读到未写完文件

根因详情:
失败发生在 tests/e2e/utils/rollout_routing_replay_test_utils.py:185，读取 ./R3_tmp/routing_replay_output_eb45/r3_chat_completion_stream/layer_0.pdtensor 时 paddle.load 抛出 EOF。代码侧 wait_for_file() 只判断文件存在；写入侧 RoutingStoreLocal.put() 直接 paddle.save(routing_indices, file_path) 到目标路径，存在读写竞态风险。PR 本次新增的 prefill_schedule() 仅在 splitwise_role == "prefill" 下执行，而该 R3 用例启动参数未设置 --splitwise-role prefill，未发现与本 PR 变更的直接关联。

关键日志:

tests/e2e/test_EB_Lite_serving_R3.py:119
check_routing_replay_chat_completion(...)
tests/e2e/utils/rollout_routing_replay_test_utils.py:185
    generated_routing = paddle.load(cur_routing_path)
E   EOFError: Ran out of input
fastdeploy.log: ConnectionResetError: [Errno 104] Connection reset by peer

修复建议:

短期建议 rerun 该 Required Job，当前失败更像 R3 routing replay 本地文件写读竞态/偶发失败。
长期建议在 tests/e2e/utils/rollout_routing_replay_test_utils.py:133-185 的等待逻辑中校验文件可成功 paddle.load 后再继续，或在 fastdeploy/model_executor/layers/moe/routing_indices_cache.py:825 改为临时文件写完后原子 rename。

修复建议摘要: 请 rerun；长期加可读校验/原子写

关联变更: 已补充读取 fastdeploy/engine/common_engine.py 与 fastdeploy/engine/sched/resource_manager_v1.py 上下文；PR 仅在 splitwise_role == "prefill" 时改用 prefill_schedule()，与该 R3 用例失败路径未见直接关联。

链接: 查看日志

Approval — 人工审批（置信度: 高）

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

codecov-commenter · 2026-05-27T13:03:50Z

Codecov Report

❌ Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@d0a9661). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/engine/sched/resource_manager_v1.py	82.35%	0 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7944   +/-   ##
==========================================
  Coverage           ?   63.81%           
==========================================
  Files              ?      467           
  Lines              ?    65067           
  Branches           ?     9976           
==========================================
  Hits               ?    41522           
  Misses             ?    20740           
  Partials           ?     2805

Flag	Coverage Δ
GPU	`72.88% <85.00%> (?)`
XPU	`7.06% <5.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

[Scheduler] Simplify scheduler for prefill instances

ca4a27b

liyonghua0910 had a problem deploying to Metax_ci May 27, 2026 12:16 — with GitHub Actions Failure

PaddlePaddle-bot reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scheduler] Simplify scheduler for prefill instances#7944

[Scheduler] Simplify scheduler for prefill instances#7944
liyonghua0910 wants to merge 1 commit into
PaddlePaddle:developfrom
liyonghua0910:develop+20260527_prefill_schedule

liyonghua0910 commented May 27, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 27, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 27, 2026

Uh oh!

PaddlePaddle-bot May 27, 2026

Uh oh!

PaddlePaddle-bot commented May 27, 2026 •

edited

Loading

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

codecov-commenter commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liyonghua0910 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Performance

Checklist

Uh oh!

paddle-bot Bot commented May 27, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 8/10 通过

2.2 可选任务 — 27/31 通过

3 失败详情（仅 required）

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

Uh oh!

codecov-commenter commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liyonghua0910 commented May 27, 2026 •

edited

Loading

PaddlePaddle-bot commented May 27, 2026 •

edited

Loading

codecov-commenter commented May 27, 2026 •

edited

Loading