[Scheduler] Simplify scheduler for prefill instances#7944
Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-27 20:33:56
📋 Review 摘要
PR 概述:为 splitwise prefill 实例新增专用 prefill_schedule() 调度方法,跳过 decode 路径,简化 P 实例调度逻辑。
变更范围:fastdeploy/engine/common_engine.py、fastdeploy/engine/sched/resource_manager_v1.py
影响面 Tag:[Scheduler] [Engine] [PD Disaggregation]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | resource_manager_v1.py:1257 |
assert 用于运行时状态检查,Python -O 下静默失效 |
| 🟡 建议 | common_engine.py:1122 |
prefill_schedule() 仅在 v1 实现,其他 resource manager 实现调用时会 AttributeError |
📝 PR 规范检查
PR 标题格式合规([Scheduler] 为官方 Tag)✓。但描述各 Section 均为空(仅含模板占位注释),Checklist 全部未勾选,需补全。
标题建议(可直接复制):
[Scheduler] Simplify scheduler for prefill instances
PR 描述建议(点击展开,可直接复制)
## Motivation
当前 prefill 实例(`splitwise_role == "prefill"`)复用通用 `schedule()` 方法,该方法包含 decode 分配逻辑,对纯 prefill 实例存在不必要的复杂度。本 PR 新增专用 `prefill_schedule()` 方法,移除无关的 decode 调度路径,以 `max_num_batched_tokens` 为全量 token 预算,仅处理 running 队列中的 prefill 任务。
## Modifications
- `fastdeploy/engine/common_engine.py`:`_fetch_request()` 中当 `splitwise_role == "prefill"` 时调用 `resource_manager.prefill_schedule()`,否则走原有 `schedule()`。
- `fastdeploy/engine/sched/resource_manager_v1.py`:新增 `prefill_schedule()` 方法,跳过 decode 请求,按 token 预算逐一调度 prefill 任务,并更新 `num_computed_tokens`。
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
逻辑思路清晰,为 prefill 实例剥离专用调度路径是合理设计。但 assert 的安全性和多实现同步两点需关注,建议修复后合入。
| batch_request = BatchRequest() | ||
| token_budget = self.config.scheduler_config.max_num_batched_tokens | ||
|
|
||
| assert len(self.waiting) == 0, "Prefill scheduler should not have waiting requests" |
There was a problem hiding this comment.
🟡 建议 assert 用于运行时状态不变量检查
Python 以 -O(优化模式)运行时,assert 语句会被完全跳过。若 self.waiting 非空,该断言失效,后续调度将静默跳过所有 waiting 请求,造成请求饥饿且无任何日志警告。
建议改为显式异常或日志警告:
if len(self.waiting) != 0:
llm_logger.warning(
f"prefill_schedule: unexpected {len(self.waiting)} waiting requests, skipping them."
)或如果这是不可恢复的状态:
if len(self.waiting) != 0:
raise RuntimeError(
f"Prefill scheduler should not have waiting requests, got {len(self.waiting)}"
)| self.resource_manager.scheduler_unhandled_request_num = self._get_scheduler_unhandled_request_num() | ||
| # 2. Schedule requests | ||
| batch_request, error_tasks = self.resource_manager.schedule() | ||
| if self.cfg.scheduler_config.splitwise_role == "prefill": |
There was a problem hiding this comment.
🟡 建议 prefill_schedule() 仅在 resource_manager_v1 中实现,其他实现调用时会抛 AttributeError
common_engine.py 对所有 splitwise_role == "prefill" 的实例无条件调用 self.resource_manager.prefill_schedule(),但该方法目前只在 resource_manager_v1.py 中新增。若项目存在其他 resource manager 实现(如 v2 或未来新增版本),以 prefill 角色启动时会在此处崩溃。
注意到代码中已有类似的防御写法(hasattr 检查 scheduler_unhandled_request_num)。建议同步到其他 resource manager 实现,或添加防御检查:
if self.cfg.scheduler_config.splitwise_role == "prefill":
if hasattr(self.resource_manager, "prefill_schedule"):
batch_request, error_tasks = self.resource_manager.prefill_schedule()
else:
batch_request, error_tasks = self.resource_manager.schedule()
else:
batch_request, error_tasks = self.resource_manager.schedule()
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览当前 Required 任务 8/10 通过,仍有 2 个 Required 失败:
2 任务状态汇总2.1 Required任务 : 8/10 通过
2.2 可选任务 — 27/31 通过
3 失败详情(仅 required)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 不稳定问题/测试失败(置信度: 中)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
失败用例:
根因详情: 关键日志: 修复建议:
修复建议摘要: 请 rerun;长期加可读校验/原子写 关联变更: 已补充读取 链接: 查看日志 Approval — 人工审批(置信度: 高)该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7944 +/- ##
==========================================
Coverage ? 63.81%
==========================================
Files ? 467
Lines ? 65067
Branches ? 9976
==========================================
Hits ? 41522
Misses ? 20740
Partials ? 2805
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Prefill instances shared the same
schedule()with Decode instances, introducing unnecessary waiting-queue and block-allocation overhead. This PR adds a dedicatedprefill_schedule()to simplify the Prefill scheduling path.Modifications
prefill_schedule()that skips the waiting queue and directly schedules from running requests with full token budget.prefill_schedule()whensplitwise_role == "prefill".cache_messager.pyviapending_layer0_signalsbuffering.top_p_listsync and buffer sizing.Usage or Command
No configuration change needed. Prefill instances automatically use the new path.
Accuracy Tests
No model computation changed. Accuracy unaffected.
Performance
GLM-4.5-Air, TP8, 1P3D, 39 concurrency, 78 multi-round chat sessions:
End-to-end throughput +16.7%.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.