Skip to content

[Scheduler] Simplify scheduler for prefill instances#7944

Open
liyonghua0910 wants to merge 1 commit into
PaddlePaddle:developfrom
liyonghua0910:develop+20260527_prefill_schedule
Open

[Scheduler] Simplify scheduler for prefill instances#7944
liyonghua0910 wants to merge 1 commit into
PaddlePaddle:developfrom
liyonghua0910:develop+20260527_prefill_schedule

Conversation

@liyonghua0910
Copy link
Copy Markdown
Collaborator

@liyonghua0910 liyonghua0910 commented May 27, 2026

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Prefill instances shared the same schedule() with Decode instances, introducing unnecessary waiting-queue and block-allocation overhead. This PR adds a dedicated prefill_schedule() to simplify the Prefill scheduling path.

Modifications

  • Add prefill_schedule() that skips the waiting queue and directly schedules from running requests with full token budget.
  • Dispatch to prefill_schedule() when splitwise_role == "prefill".
  • Fix layer0 signal race in cache_messager.py via pending_layer0_signals buffering.
  • Support NaiveProposer; fix CUDAGraph replay OOB access (CUDA error 700); fix top_p_list sync and buffer sizing.

Usage or Command

No configuration change needed. Prefill instances automatically use the new path.

Accuracy Tests

No model computation changed. Accuracy unaffected.

Performance

GLM-4.5-Air, TP8, 1P3D, 39 concurrency, 78 multi-round chat sessions:

Metric Before After Δ
Total time 7912s 6778s -14.3%
Avg TTFT 29.4s 23.6s -19.8%
P90 TTFT 121.3s 91.7s -24.4%

End-to-end throughput +16.7%.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 27, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-27 20:33:56

📋 Review 摘要

PR 概述:为 splitwise prefill 实例新增专用 prefill_schedule() 调度方法,跳过 decode 路径,简化 P 实例调度逻辑。
变更范围fastdeploy/engine/common_engine.pyfastdeploy/engine/sched/resource_manager_v1.py
影响面 Tag[Scheduler] [Engine] [PD Disaggregation]

问题

级别 文件 概述
🟡 建议 resource_manager_v1.py:1257 assert 用于运行时状态检查,Python -O 下静默失效
🟡 建议 common_engine.py:1122 prefill_schedule() 仅在 v1 实现,其他 resource manager 实现调用时会 AttributeError

📝 PR 规范检查

PR 标题格式合规([Scheduler] 为官方 Tag)✓。但描述各 Section 均为空(仅含模板占位注释),Checklist 全部未勾选,需补全。

标题建议(可直接复制):

  • [Scheduler] Simplify scheduler for prefill instances
PR 描述建议(点击展开,可直接复制)
## Motivation
当前 prefill 实例(`splitwise_role == "prefill"`)复用通用 `schedule()` 方法,该方法包含 decode 分配逻辑,对纯 prefill 实例存在不必要的复杂度。本 PR 新增专用 `prefill_schedule()` 方法,移除无关的 decode 调度路径,以 `max_num_batched_tokens` 为全量 token 预算,仅处理 running 队列中的 prefill 任务。

## Modifications
- `fastdeploy/engine/common_engine.py``_fetch_request()` 中当 `splitwise_role == "prefill"` 时调用 `resource_manager.prefill_schedule()`,否则走原有 `schedule()`- `fastdeploy/engine/sched/resource_manager_v1.py`:新增 `prefill_schedule()` 方法,跳过 decode 请求,按 token 预算逐一调度 prefill 任务,并更新 `num_computed_tokens`## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

逻辑思路清晰,为 prefill 实例剥离专用调度路径是合理设计。但 assert 的安全性和多实现同步两点需关注,建议修复后合入。

batch_request = BatchRequest()
token_budget = self.config.scheduler_config.max_num_batched_tokens

assert len(self.waiting) == 0, "Prefill scheduler should not have waiting requests"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 assert 用于运行时状态不变量检查

Python 以 -O(优化模式)运行时,assert 语句会被完全跳过。若 self.waiting 非空,该断言失效,后续调度将静默跳过所有 waiting 请求,造成请求饥饿且无任何日志警告。

建议改为显式异常或日志警告:

if len(self.waiting) != 0:
    llm_logger.warning(
        f"prefill_schedule: unexpected {len(self.waiting)} waiting requests, skipping them."
    )

或如果这是不可恢复的状态:

if len(self.waiting) != 0:
    raise RuntimeError(
        f"Prefill scheduler should not have waiting requests, got {len(self.waiting)}"
    )

self.resource_manager.scheduler_unhandled_request_num = self._get_scheduler_unhandled_request_num()
# 2. Schedule requests
batch_request, error_tasks = self.resource_manager.schedule()
if self.cfg.scheduler_config.splitwise_role == "prefill":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 prefill_schedule() 仅在 resource_manager_v1 中实现,其他实现调用时会抛 AttributeError

common_engine.py 对所有 splitwise_role == "prefill" 的实例无条件调用 self.resource_manager.prefill_schedule(),但该方法目前只在 resource_manager_v1.py 中新增。若项目存在其他 resource manager 实现(如 v2 或未来新增版本),以 prefill 角色启动时会在此处崩溃。

注意到代码中已有类似的防御写法(hasattr 检查 scheduler_unhandled_request_num)。建议同步到其他 resource manager 实现,或添加防御检查:

if self.cfg.scheduler_config.splitwise_role == "prefill":
    if hasattr(self.resource_manager, "prefill_schedule"):
        batch_request, error_tasks = self.resource_manager.prefill_schedule()
    else:
        batch_request, error_tasks = self.resource_manager.schedule()
else:
    batch_request, error_tasks = self.resource_manager.schedule()

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 27, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-27 23:53:03

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

当前 Required 任务 8/10 通过,仍有 2 个 Required 失败:Approval 需要人工审批;主测试任务命中既有分析缓存,失败更像 R3 routing replay 本地文件写读竞态/偶发问题,建议优先审批并 rerun。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
41(0) 41 35 6 0 0 0

2 任务状态汇总

2.1 Required任务 : 8/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 1h23m 不稳定问题:R3 路由文件写读竞态 请 rerun;长期加可读校验/原子写 Job -
Approval 19s 需要 Approval 请通过人工审批 Job -
其余 8 个必选任务通过 - - - - -

2.2 可选任务 — 27/31 通过

可选任务不阻塞合并,失败仅供参考,不做深度分析。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 1m55s Job -
Check PR Template 23s Job -
CI_HPU 1h21m Job -
Trigger Jenkins for PR 21s Job -
其余 27 个可选任务通过 - - -

3 失败详情(仅 required)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 不稳定问题/测试失败(置信度: 中)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 测试失败
  • 置信度: 中
  • 根因摘要: R3 路由文件写读竞态
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试 错误 根因
tests/e2e/test_EB_Lite_serving_R3.py::test_r3_accuracy EOFError: Ran out of input 测试在 layer_0.pdtensor 文件刚出现时即 paddle.load,疑似读到未写完文件

根因详情:
失败发生在 tests/e2e/utils/rollout_routing_replay_test_utils.py:185,读取 ./R3_tmp/routing_replay_output_eb45/r3_chat_completion_stream/layer_0.pdtensorpaddle.load 抛出 EOF。代码侧 wait_for_file() 只判断文件存在;写入侧 RoutingStoreLocal.put() 直接 paddle.save(routing_indices, file_path) 到目标路径,存在读写竞态风险。PR 本次新增的 prefill_schedule() 仅在 splitwise_role == "prefill" 下执行,而该 R3 用例启动参数未设置 --splitwise-role prefill,未发现与本 PR 变更的直接关联。

关键日志:

tests/e2e/test_EB_Lite_serving_R3.py:119
check_routing_replay_chat_completion(...)
tests/e2e/utils/rollout_routing_replay_test_utils.py:185
    generated_routing = paddle.load(cur_routing_path)
E   EOFError: Ran out of input
fastdeploy.log: ConnectionResetError: [Errno 104] Connection reset by peer

修复建议:

  1. 短期建议 rerun 该 Required Job,当前失败更像 R3 routing replay 本地文件写读竞态/偶发失败。
  2. 长期建议在 tests/e2e/utils/rollout_routing_replay_test_utils.py:133-185 的等待逻辑中校验文件可成功 paddle.load 后再继续,或在 fastdeploy/model_executor/layers/moe/routing_indices_cache.py:825 改为临时文件写完后原子 rename。

修复建议摘要: 请 rerun;长期加可读校验/原子写

关联变更: 已补充读取 fastdeploy/engine/common_engine.pyfastdeploy/engine/sched/resource_manager_v1.py 上下文;PR 仅在 splitwise_role == "prefill" 时改用 prefill_schedule(),与该 R3 用例失败路径未见直接关联。

链接: 查看日志

Approval — 人工审批(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

❌ Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@d0a9661). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/sched/resource_manager_v1.py 82.35% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7944   +/-   ##
==========================================
  Coverage           ?   63.81%           
==========================================
  Files              ?      467           
  Lines              ?    65067           
  Branches           ?     9976           
==========================================
  Hits               ?    41522           
  Misses             ?    20740           
  Partials           ?     2805           
Flag Coverage Δ
GPU 72.88% <85.00%> (?)
XPU 7.06% <5.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants