fix: handle missing KV cache without crashing engine by lvhan028 · Pull Request #4497 · InternLM/lmdeploy

lvhan028 · 2026-04-04T10:00:39Z

No description provided.

…V cache without crashing engine

Copilot

Pull request overview

This PR improves robustness around engine sleep/wakeup by rejecting inference requests while sleeping and preventing PyTorch-engine inference from crashing when KV/state cache engines are missing (e.g., after sleep or partial wakeup).

Changes:

Make sleep/wakeup flows async end-to-end (OpenAI server → AsyncEngine → backend engines/executors), with thread offloading where appropriate.
Add an epoch-stamping mechanism to drop work that bound a session before a stop-all/abort-all event, avoiding races during sleep.
Convert missing cache situations into structured internal-engine errors instead of uncaught exceptions, and propagate them through the engine loop.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
lmdeploy/turbomind/turbomind.py	Makes TurboMind sleep/wakeup async via `asyncio.to_thread` wrappers.
lmdeploy/serve/openai/api_server.py	Rejects inference requests while sleeping; stamps session epoch; makes `/sleep` and `/wakeup` await engine operations.
lmdeploy/serve/managers/session_manager.py	Adds `Session.epoch` tracking and logs epoch on abort.
lmdeploy/serve/core/async_engine.py	Adds stale-session dropping via epoch; makes sleep/wakeup await backend; adjusts metrics increments for new abort/error paths.
lmdeploy/pytorch/engine/mp_engine/base.py	Converts MP engine sleep/wakeup RPC calls to async.
lmdeploy/pytorch/engine/mp_engine/base_worker.py	Converts MP worker sleep/wakeup to async and awaits engine methods.
lmdeploy/pytorch/engine/model_agent/agent.py	Introduces `CacheNotReadyError`, guards cache usage, and converts cache-missing failures into batched outputs with an engine error message.
lmdeploy/pytorch/engine/model_agent/init.py	Exposes `CacheNotReadyError` from the model_agent package.
lmdeploy/pytorch/engine/executor/uni_executor.py	Adds async sleep/wakeup plumbing for the single-device executor.
lmdeploy/pytorch/engine/executor/ray_executor.py	Makes sleep/wakeup async and offloads blocking RPC calls to threads.
lmdeploy/pytorch/engine/executor/mp_executor.py	Adds async sleep/wakeup implemented via `collective_rpc_async`.
lmdeploy/pytorch/engine/executor/base.py	Updates executor interface: wakeup is now async.
lmdeploy/pytorch/engine/executor/base_worker.py	Updates worker wrapper interface: wakeup is now async.
lmdeploy/pytorch/engine/engine.py	Makes engine sleep/wakeup async and awaits executor implementations.
lmdeploy/pytorch/engine/engine_loop.py	Treats `engine_error_msg` in `BatchedOutputs` as an internal engine error and finishes running requests accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-04T10:10:30Z

lmdeploy/serve/core/async_engine.py


+    def _if_session_stale(self, session: Session,
+                                   input_token_len: int) -> GenOut | None:
+        """If session is stamped ``http_bind_epoch`` by api_server and


Docstring refers to http_bind_epoch, but the stamped attribute is session.epoch (set in api_server.VariableInterface.get_session). Consider updating the wording to avoid suggesting a non-existent field/name.

Suggested change

"""If session is stamped ``http_bind_epoch`` by api_server and

"""If api_server stamped the session's ``epoch`` and

lmdeploy/serve/openai/api_server.py

lvhan028 added 3 commits April 4, 2026 06:53

bind epoch to session

e0ceca3

change to async sleep and async wakeup

e91d687

fix: reject OpenAI/generate requests while sleeping; handle missing K…

e4befb4

…V cache without crashing engine

Copilot AI review requested due to automatic review settings April 4, 2026 10:00

Copilot started reviewing on behalf of lvhan028 April 4, 2026 10:01 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

reject requests when engine sleeps

20883ea

lvhan028 changed the title ~~fix: reject OpenAI/generate requests while sleeping; handle missing KV cache without crashing engine~~ fix: handle missing KV cache without crashing engine Apr 7, 2026

lvhan028 added 2 commits April 7, 2026 08:33

implement EngineSleepingMiddleware

3fe7f01

merge main

87490c5

lvhan028 added the Bug:P0 label Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle missing KV cache without crashing engine#4497

fix: handle missing KV cache without crashing engine#4497
lvhan028 wants to merge 6 commits intoInternLM:mainfrom
lvhan028:fix-sleep-wakeup

lvhan028 commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"""If session is stamped ``http_bind_epoch`` by api_server and
	"""If api_server stamped the session's ``epoch`` and

Conversation

lvhan028 commented Apr 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants