[TRTLLM-10703][feature] abort, resume for Async RL in verl#12272
[TRTLLM-10703][feature] abort, resume for Async RL in verl#12272hchings wants to merge 12 commits intoNVIDIA:mainfrom
Conversation
b56ec4d to
e3294ca
Compare
update Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
e3294ca to
98bc625
Compare
|
/bot run |
📝 WalkthroughWalkthroughChanges introduce pause/resume generation functionality to AsyncLLM, add abort_all_requests method to RayExecutor, add reset_prefix_cache method to WorkerExtension, and include a new test validating prefix cache reset behavior with KV-cache reuse metrics. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/unittest/llmapi/test_async_llm.py (1)
140-178: Please add direct test coverage for pause/resume abort semantics.This test validates
reset_prefix_cachewell, but the new feature also addspause_generation()/resume_generation()andabort_all_requests(). A targeted async test should assert: (1) new submissions fail while paused, (2) in-flight requests are aborted, and (3) submissions succeed again after resume.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unittest/llmapi/test_async_llm.py` around lines 140 - 178, Add a new async test in tests/unittest/llmapi/test_async_llm.py that mirrors the reset_prefix_cache pattern but specifically exercises AsyncLLM.pause_generation(), AsyncLLM.resume_generation(), and AsyncLLM.abort_all_requests(): open an AsyncLLM context (same model, kv_cache_config, sampling params), start a long-running generate_async call (e.g., with a high max_tokens or a prompt that triggers streaming) and concurrently call pause_generation() (or llm.collective_rpc("pause_generation") if RPC-only), assert that new generate_async submissions immediately fail/raise with a paused/validation error, call abort_all_requests() and assert the in-flight generate_async finishes with an aborted/error state, then call resume_generation() and assert subsequent generate_async calls succeed normally. Reference AsyncLLM, generate_async, pause_generation, resume_generation, and abort_all_requests so the test locates the right methods.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/_torch/async_llm.py`:
- Around line 105-109: pause_generation currently calls the synchronous method
_executor.abort_all_requests(), which blocks the asyncio event loop; change
pause_generation to call that synchronous function via await
asyncio.to_thread(self._executor.abort_all_requests) so the abort runs off the
event loop, and ensure asyncio is imported; update the async def
pause_generation method to set self._paused = True then await
asyncio.to_thread(...) to perform the abort without blocking.
In `@tensorrt_llm/executor/ray_executor.py`:
- Around line 312-315: The loop in abort_all_requests builds
list(self._results.values()) without synchronizing access to self._results,
which can be mutated concurrently; fix by acquiring the mutex that protects
_results (e.g., self._results_lock or the existing lock used around
submission/response paths) while copying the values into a local list, then
release the lock and iterate over that copied list calling result.abort() so
aborts run without holding the lock. Ensure you reference the abort_all_requests
method and the _results container when applying the change.
---
Nitpick comments:
In `@tests/unittest/llmapi/test_async_llm.py`:
- Around line 140-178: Add a new async test in
tests/unittest/llmapi/test_async_llm.py that mirrors the reset_prefix_cache
pattern but specifically exercises AsyncLLM.pause_generation(),
AsyncLLM.resume_generation(), and AsyncLLM.abort_all_requests(): open an
AsyncLLM context (same model, kv_cache_config, sampling params), start a
long-running generate_async call (e.g., with a high max_tokens or a prompt that
triggers streaming) and concurrently call pause_generation() (or
llm.collective_rpc("pause_generation") if RPC-only), assert that new
generate_async submissions immediately fail/raise with a paused/validation
error, call abort_all_requests() and assert the in-flight generate_async
finishes with an aborted/error state, then call resume_generation() and assert
subsequent generate_async calls succeed normally. Reference AsyncLLM,
generate_async, pause_generation, resume_generation, and abort_all_requests so
the test locates the right methods.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d18175ed-7c06-4ed7-b46b-ce66ef8bd1c6
📒 Files selected for processing (4)
tensorrt_llm/_torch/async_llm.pytensorrt_llm/executor/ray_executor.pytensorrt_llm/llmapi/rlhf_utils.pytests/unittest/llmapi/test_async_llm.py
88f495b to
08ef80a
Compare
08ef80a to
7260753
Compare
|
/bot run |
|
PR_Github #41009 [ run ] triggered by Bot. Commit: |
|
PR_Github #41009 [ run ] completed with state
|
|
/bot run |
|
PR_Github #41263 [ run ] triggered by Bot. Commit: |
|
PR_Github #41263 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42407 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42735 [ run ] triggered by Bot. Commit: |
|
PR_Github #42735 [ run ] completed with state |
|
/bot run |
|
PR_Github #42943 [ run ] triggered by Bot. Commit: |
|
PR_Github #42943 [ run ] completed with state
|
|
/bot run |
|
PR_Github #43092 [ run ] triggered by Bot. Commit: |
|
PR_Github #43092 [ run ] completed with state
|
|
/bot run |
|
PR_Github #43191 [ run ] triggered by Bot. Commit: |
|
PR_Github #43191 [ run ] completed with state |
|
/bot run |
|
PR_Github #43283 [ run ] triggered by Bot. Commit: |
|
PR_Github #43283 [ run ] completed with state
|
|
/bot run |
|
/bot kill |
|
PR_Github #43551 [ run ] triggered by Bot. Commit: |
|
PR_Github #43553 [ kill ] triggered by Bot. Commit: |
|
PR_Github #43551 [ run ] completed with state |
|
PR_Github #43553 [ kill ] completed with state |
|
/bot run |
|
PR_Github #43682 [ run ] triggered by Bot. Commit: |
|
PR_Github #43682 [ run ] completed with state
|
Summary by CodeRabbit
Release Notes
New Features
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.