Skip to content

feat: configure cudagraph capture batch sizes#4573

Open
CUHKSZzxy wants to merge 4 commits into
InternLM:mainfrom
CUHKSZzxy:feat/cudagraph-capture-batch-sizes
Open

feat: configure cudagraph capture batch sizes#4573
CUHKSZzxy wants to merge 4 commits into
InternLM:mainfrom
CUHKSZzxy:feat/cudagraph-capture-batch-sizes

Conversation

@CUHKSZzxy
Copy link
Copy Markdown
Collaborator

@CUHKSZzxy CUHKSZzxy commented May 8, 2026

Summary

  • Add PyTorchEngineConfig support for explicit CUDA graph capture batch sizes and route them through CacheConfig and graph runners.
  • Normalize configured sizes by validating positive integers, deduplicating, sorting, filtering to max_batch_size, and always including max_batch_size so valid scheduler batches stay covered without eager fallback.
  • Add focused unit coverage for normalization and CUDA graph capture-size selection.

Validation

  • Focused unit tests for cudagraph capture batch sizes.
  • Syntax checks for touched modules.
  • Real PyTorch API server smoke test with custom capture sizes confirmed normalization, graph capture, and graph replay under concurrent decode.

Assistance

Assisted with Codex + GPT-5.5 High

@lvhan028 lvhan028 requested review from grimoire and removed request for grimoire May 8, 2026 14:40
@CUHKSZzxy CUHKSZzxy marked this pull request as ready for review May 26, 2026 08:07
Copilot AI review requested due to automatic review settings May 26, 2026 08:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an explicit configuration path for CUDA graph capture batch sizes in the PyTorch engine, threading the configured sizes from PytorchEngineConfig through CacheConfig into graph runners.

Changes:

  • Add cudagraph_capture_batch_sizes to PytorchEngineConfig and CacheConfig, and plumb it through config building and spec-decode config cloning.
  • Update graph runners to prefer the configured capture sizes over the inferred (power-of-two) defaults.
  • Expose the new option via the lmdeploy serve api_server CLI.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
lmdeploy/pytorch/engine/config_builder.py Validates/normalizes configured capture sizes and passes them into CacheConfig.
lmdeploy/pytorch/config.py Adds cudagraph_capture_batch_sizes to CacheConfig and propagates it in SpecDecodeConfig.from_config.
lmdeploy/pytorch/backends/graph_runner.py Uses configured capture sizes when present.
lmdeploy/pytorch/backends/cuda/graph_runner.py Uses configured capture sizes when present (CUDA backend).
lmdeploy/messages.py Adds cudagraph_capture_batch_sizes to PytorchEngineConfig and documents it.
lmdeploy/cli/utils.py Adds --cudagraph-capture-batch-sizes argument.
lmdeploy/cli/serve.py Wires the CLI argument into PytorchEngineConfig for the API server.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lmdeploy/pytorch/engine/config_builder.py Outdated
Comment thread lmdeploy/pytorch/backends/graph_runner.py
Comment thread lmdeploy/pytorch/backends/cuda/graph_runner.py
Comment thread lmdeploy/pytorch/engine/config_builder.py Outdated
@CUHKSZzxy CUHKSZzxy force-pushed the feat/cudagraph-capture-batch-sizes branch from 774b5d9 to 93496c5 Compare May 26, 2026 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants