[Model Runner] Support overlap schedule by Sunny-bot1 · Pull Request #6259 · PaddlePaddle/FastDeploy

Sunny-bot1 · 2026-01-28T09:19:16Z

Motivation

当前 FD step 之间存在较大的时间间隙，主要瓶颈来自前后处理阶段引入的 DtoH 同步拷贝。该同步操作会阻塞上层调度线程的 CPU 执行，进而延迟后续 kernel 的 launch，导致 GPU 计算与调度流程被迫串行化。

本 PR 及相关改动的目标是引入 GPU 异步调度优化：通过使用异步拷贝与事件同步机制，消除不必要的 CPU–GPU 强同步点，使模型执行与上层调度以及 token 写回并行执行，从而实现 FD step 间隙的缩短，提升整体推理吞吐。

Modifications

本PR：

消除token_num_cpu计算引入的同步拷贝：当连续处理decode batch时复用上一个 batch 的 token_num来进行 launch
tp_barrier：使用cpu barrier
实现 execute_model_overlap

效果：

端到端提升10%

模型&配置	TP	并发	输入长度	输出长度	总耗时	解码速度	TPOT
GLM-4.5-Air	8	4	500	10k	1868	88	11.36
GLM-4.5-Air + overlap schedule	8	4	500	10k	1683	98	10.11
ERNIE-4.5-21B-A3B-Paddle	4	64	2k	500	683	80	12.75
ERNIE-4.5-21B-A3B-Paddle + overlap schedule	4	64	2k	500	622	91	11.21

GLM TP8 2 ms->340 us

TODO：

Usage or Command

--enable-overlap-schedule: 开启异步调度，默认关闭；由于当前未适配MTP，MTP开启后此开关失效

python -m fastdeploy.entrypoints.openai.api_server --model ${model_path} \
    --max-model-len 32768 \
    --max-num-seqs 128 \
    --port 8908 \
    --tensor-parallel-size 4 \
    --enable-overlap-schedule

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-01-28T09:19:22Z

Thanks for your contribution!

…into overlap

codecov-commenter · 2026-02-02T15:46:38Z

Codecov Report

❌ Patch coverage is 74.57627% with 15 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@c745a22). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/gpu_model_runner.py	65.85%	11 Missing and 3 partials ⚠️
fastdeploy/worker/worker_process.py	80.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6259   +/-   ##
==========================================
  Coverage           ?   67.93%           
==========================================
  Files              ?      389           
  Lines              ?    51886           
  Branches           ?     8077           
==========================================
  Hits               ?    35250           
  Misses             ?    14067           
  Partials           ?     2569

Flag	Coverage Δ
GPU	`67.93% <74.57%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…into overlap

support overlap schedule

585d7f2

Sunny-bot1 had a problem deploying to Metax_ci January 28, 2026 09:19 — with GitHub Actions Error

add compute_token_num

c8a29cc

Sunny-bot1 had a problem deploying to Metax_ci January 28, 2026 09:34 — with GitHub Actions Failure

disable

44a41d0

Sunny-bot1 had a problem deploying to Metax_ci January 28, 2026 11:23 — with GitHub Actions Failure

fix token_num

fb66f46

Sunny-bot1 had a problem deploying to Metax_ci February 2, 2026 11:10 — with GitHub Actions Error

fix doc

2fbdf95

Sunny-bot1 had a problem deploying to Metax_ci February 2, 2026 11:13 — with GitHub Actions Error

Sunny-bot1 marked this pull request as draft February 2, 2026 11:36

fix

c322cbd

Sunny-bot1 had a problem deploying to Metax_ci February 2, 2026 12:51 — with GitHub Actions Failure

Sunny-bot1 marked this pull request as ready for review February 2, 2026 13:00

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

a217629

…into overlap

Sunny-bot1 had a problem deploying to Metax_ci February 2, 2026 13:57 — with GitHub Actions Failure

fix

0f448c2

Sunny-bot1 had a problem deploying to Metax_ci February 3, 2026 07:29 — with GitHub Actions Failure

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

c520795

…into overlap

Sunny-bot1 had a problem deploying to Metax_ci February 3, 2026 11:22 — with GitHub Actions Failure

fix

4bb06f9

Sunny-bot1 had a problem deploying to Metax_ci February 3, 2026 11:34 — with GitHub Actions Failure

fix

c1a18f5

Sunny-bot1 had a problem deploying to Metax_ci February 3, 2026 11:51 — with GitHub Actions Failure

zhoutianzi666 approved these changes Feb 4, 2026

View reviewed changes

zhoutianzi666 merged commit 9b0a82c into PaddlePaddle:develop Feb 4, 2026
20 of 24 checks passed

StareAtYou added a commit to StareAtYou/FastDeploy that referenced this pull request Feb 4, 2026

[Metax][Fix] fix issues based PaddlePaddle#6259

b96eed1

yuanlehome pushed a commit that referenced this pull request Feb 4, 2026

[Metax][Fix] fix issues based #6259 (#6338)

e109fb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model Runner] Support overlap schedule#6259

[Model Runner] Support overlap schedule#6259
zhoutianzi666 merged 11 commits intoPaddlePaddle:developfrom
Sunny-bot1:overlap

Sunny-bot1 commented Jan 28, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Jan 28, 2026

Uh oh!

codecov-commenter commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Sunny-bot1 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

相关PR：

本PR：

效果：

TODO：

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Jan 28, 2026

Uh oh!

codecov-commenter commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sunny-bot1 commented Jan 28, 2026 •

edited

Loading

codecov-commenter commented Feb 2, 2026 •

edited

Loading