[KVCache] Storage cache supports c8 model#6298
Merged
juncaipeng merged 6 commits intoPaddlePaddle:developfrom Feb 6, 2026
Merged
[KVCache] Storage cache supports c8 model#6298juncaipeng merged 6 commits intoPaddlePaddle:developfrom
juncaipeng merged 6 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Contributor
There was a problem hiding this comment.
Pull request overview
此 PR 为存储缓存添加了对 c8 模型的支持,主要是通过实现 block_wise_fp8 量化缓存的 scale 数据的读写功能。
Changes:
- 扩展 mooncake_store 的 query 方法以支持可选的 scale keys 参数
- 在 CacheTransferManager 中添加 cache scale 支持,包括 scale buffer 的初始化和管理
- 更新 CUDA kernel 中的 cache_block_stride 计算以支持不同维度的缓存布局
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| fastdeploy/cache_manager/transfer_factory/mooncake_store/mooncake_store.py | 添加 k_scale_keys 和 v_scale_keys 参数到 query 方法,支持检查 scale 数据的存在性 |
| fastdeploy/cache_manager/cache_transfer_manager.py | 实现 block_wise_fp8 的完整支持,包括 scale buffer 初始化、读写操作中的 scale 数据处理 |
| custom_ops/gpu_ops/swap_cache_layout.cu | 重构 cache_block_stride 计算以支持不同形状的缓存布局(包括 scale 布局) |
fastdeploy/cache_manager/transfer_factory/mooncake_store/mooncake_store.py
Show resolved
Hide resolved
fastdeploy/cache_manager/transfer_factory/mooncake_store/mooncake_store.py
Outdated
Show resolved
Hide resolved
fastdeploy/cache_manager/transfer_factory/mooncake_store/mooncake_store.py
Show resolved
Hide resolved
fastdeploy/cache_manager/transfer_factory/mooncake_store/mooncake_store.py
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6298 +/- ##
==========================================
Coverage ? 67.02%
==========================================
Files ? 387
Lines ? 51493
Branches ? 8030
==========================================
Hits ? 34513
Misses ? 14508
Partials ? 2472
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fastdeploy/cache_manager/transfer_factory/mooncake_store/mooncake_store.py
Show resolved
Hide resolved
fastdeploy/cache_manager/transfer_factory/mooncake_store/mooncake_store.py
Show resolved
Hide resolved
fastdeploy/cache_manager/transfer_factory/mooncake_store/mooncake_store.py
Show resolved
Hide resolved
fastdeploy/cache_manager/transfer_factory/mooncake_store/mooncake_store.py
Show resolved
Hide resolved
fastdeploy/cache_manager/transfer_factory/mooncake_store/mooncake_store.py
Show resolved
Hide resolved
6da40b1 to
58240a4
Compare
Jiang-Jia-Jun
approved these changes
Feb 5, 2026
EmmonsCurse
approved these changes
Feb 6, 2026
Collaborator
EmmonsCurse
left a comment
There was a problem hiding this comment.
LGTM~后续辛苦补充 cache_transfer_manager.py 相关单测~
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
全局cache池化支持c8模型
Modifications
Cache读写模块
Usage or Command
xxx
Accuracy Tests
xxx
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.