[None][feat] Add radix tree cache priority boost and store chunked context blocks by lancelly · Pull Request #12696 · NVIDIA/TensorRT-LLM

lancelly · 2026-04-02T15:10:48Z

Summary

Cherry-pick Features A and B from PR #12481 (improve-scheduler) onto feat/bench_y.

Feature A: Radix tree cache priority boost — Blocks in the radix tree get priority boosted (kRadixTreeBlockPriority = kDefaultPriority + 10) on release, so truly-free blocks are evicted first, preserving cached data longer. Adds getNumTrulyFreeBlocks() to distinguish empty blocks from cached ones.
Feature B: Store chunked context blocks — New storeChunkedContextBlocks() stores full blocks from a completed (non-final) context chunk immediately, enabling prefix sharing with concurrent requests before the full context finishes. Integrated into both TRT and PyTorch backends. SWA windows are skipped.

…ntext blocks Cherry-pick Features A and B from PR NVIDIA#12481 (improve-scheduler) onto feat/bench_y. Feature A (Radix tree cache priority boost): - Blocks in the radix tree get priority boosted (kRadixTreeBlockPriority = kDefaultPriority + 10) on release, so truly-free blocks are evicted first, preserving cached data longer. - Adds getNumTrulyFreeBlocks() to distinguish empty blocks from cached ones. Feature B (Store chunked context blocks): - New storeChunkedContextBlocks() method stores full blocks from a completed (non-final) context chunk immediately, enabling prefix sharing with concurrent requests before the full context finishes processing. - Integrated into both TRT and PyTorch (PyExecutor) backends. - SWA windows are skipped (their block boundaries are unstable mid-context). Note: Feature C (reservation-based scheduling) from PR NVIDIA#12481 is intentionally excluded. The pinning infrastructure (boostReservedBlock/unpinAllReservedBlocks) is also excluded as it has no call site without Feature C. Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>

SimengLiu-nv · 2026-04-02T15:36:41Z

cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp

 auto const kMaxPriority = executor::KvCacheRetentionConfig::kMaxRetentionPriority;

 auto const kDefaultPriority = executor::KvCacheRetentionConfig::kDefaultRetentionPriority;
+// No longer used for priority boosting (pinning replaced it), but kept for backward compat.


Can remove this comment later.

github-actions bot assigned lancelly Apr 2, 2026

lancelly force-pushed the feat/bench_y_cherry_pick branch from e3dd6a4 to 85a4299 Compare April 2, 2026 15:24

SimengLiu-nv approved these changes Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][feat] Add radix tree cache priority boost and store chunked context blocks#12696

[None][feat] Add radix tree cache priority boost and store chunked context blocks#12696
lancelly wants to merge 1 commit intoNVIDIA:feat/bench_yfrom
lancelly:feat/bench_y_cherry_pick

lancelly commented Apr 2, 2026 •

edited

Loading

Uh oh!

SimengLiu-nv Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lancelly commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

SimengLiu-nv Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lancelly commented Apr 2, 2026 •

edited

Loading