[#12712][feat] AutoDeploy Model Onboarding Sprint 03/19 - Part 1 (infra only)#12708
[#12712][feat] AutoDeploy Model Onboarding Sprint 03/19 - Part 1 (infra only)#12708govind-ramnarayan wants to merge 5 commits intoNVIDIA:mainfrom
Conversation
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughThe pull request introduces infrastructure changes to support new models, shifting default optimization backends from Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/auto_deploy/transform/library/_onnx_schemas.py (1)
1-1:⚠️ Potential issue | 🟡 MinorUpdate copyright year to 2026.
The file has been modified and the copyright year should reflect the current year.
-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025-2026, NVIDIA CORPORATION. All rights reserved.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/auto_deploy/transform/library/_onnx_schemas.py` at line 1, Update the copyright header at the top of the file by changing the year from 2025 to 2026; locate the top-of-file comment string that currently reads "Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved." and edit it to "Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved." so the file header reflects the current year.
🧹 Nitpick comments (1)
tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_trtllm_attention_quant_fp8.py (1)
247-253: Consider usingdefault_max_num_tokenshelper for consistency.Other test files in this PR (e.g.,
test_gated_delta_rule_cache.py,test_torch_gated_delta_rule_cache.py) use thedefault_max_num_tokens(max_seq_len, max_batch_size)helper which computes(max_seq_len + 1) * max_batch_size. Here, the hardcoded value256differs from what the formula would produce:(64 + 1) * 4 = 260.If the specific value
256is intentional for this test, consider adding a brief comment explaining why. Otherwise, consider importing and usingdefault_max_num_tokensfor consistency.Also applies to: 296-302
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_trtllm_attention_quant_fp8.py` around lines 247 - 253, The test instantiates CachedSequenceInterface with a hardcoded max_num_tokens=256 which is inconsistent with the project's helper; replace the literal with default_max_num_tokens(max_seq_len, max_batch_size) (importing default_max_num_tokens) so max_num_tokens is computed as (max_seq_len + 1) * max_batch_size for consistency with other tests, or if 256 is intentional add a brief inline comment next to the CachedSequenceInterface invocation explaining why it differs; target the CachedSequenceInterface construction and the variables max_seq_len and max_batch_size referenced in the test (also update the similar occurrence around the 296-302 block).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/agents/ad-onboard-reviewer.md:
- Around line 47-55: Remove the duplicated "BB. Vision / Multi-Modal Support"
checklist block by deleting the second occurrence of the header and its table
(the repeated "BB. Vision / Multi-Modal Support" section that duplicates the
earlier BB1–BB2 table), leaving a single canonical BB section; ensure any
references to BB1 and BB2 remain intact and that the file contains only one
instance of that header and its three-column table so the checklist is not
duplicated.
In `@AGENTS.md`:
- Around line 126-133: The GitHub CLI guidance under the "GitHub CLI
authentication (`GH_CONFIG_DIR`)" section in AGENTS.md appears unrelated to the
current paperclip PR; either add a short clarifying note explaining when/why
agents should set GH_CONFIG_DIR (e.g., for workflows that interact with forks or
run external gh commands) or remove/move this whole section to a separate
documentation PR dedicated to GitHub CLI workflows; update the AGENTS.md header
"GitHub CLI authentication (`GH_CONFIG_DIR`)" and include a one-line rationale
if keeping it so future reviewers know its intended scope.
In `@examples/auto_deploy/build_and_run_ad.py`:
- Around line 100-104: The VS Code launch configuration still passes the removed
BenchmarkConfig.enabled flag which causes Pydantic validation to fail
(ExperimentConfig uses extra="forbid"); open the launch.json used for
running/debugging and remove the "--benchmark.enabled=false" argument so no
unknown "--benchmark.enabled" CLI flag is passed, ensuring runs use the updated
BenchmarkConfig (results_path/store_results) and ExperimentConfig without
validation errors.
In `@tensorrt_llm/_torch/auto_deploy/llm.py`:
- Around line 49-57: The current normalization mutates inputs["messages"]
in-place (variable messages) when self.processor has image_processor by
assigning to msg["content"], which can produce caller-visible side effects; fix
by working on a copy: create a shallow/deep copy of inputs["messages"] (e.g.,
clone messages into a new list of dicts) before the loop and modify that copy
(use that copy for downstream processing) so inputs["messages"] remains
unchanged; update any places that use the original variable to use the copied
variable instead (refer to variables/messages and the hasattr(self.processor,
"image_processor") block).
---
Outside diff comments:
In `@tensorrt_llm/_torch/auto_deploy/transform/library/_onnx_schemas.py`:
- Line 1: Update the copyright header at the top of the file by changing the
year from 2025 to 2026; locate the top-of-file comment string that currently
reads "Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved." and edit it
to "Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved." so the file
header reflects the current year.
---
Nitpick comments:
In
`@tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_trtllm_attention_quant_fp8.py`:
- Around line 247-253: The test instantiates CachedSequenceInterface with a
hardcoded max_num_tokens=256 which is inconsistent with the project's helper;
replace the literal with default_max_num_tokens(max_seq_len, max_batch_size)
(importing default_max_num_tokens) so max_num_tokens is computed as (max_seq_len
+ 1) * max_batch_size for consistency with other tests, or if 256 is intentional
add a brief inline comment next to the CachedSequenceInterface invocation
explaining why it differs; target the CachedSequenceInterface construction and
the variables max_seq_len and max_batch_size referenced in the test (also update
the similar occurrence around the 296-302 block).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 270f293f-90b8-4aba-b26e-a8d98a56b76c
📒 Files selected for processing (25)
.claude/agents/ad-onboard-reviewer.mdAGENTS.mdexamples/auto_deploy/build_and_run_ad.pytensorrt_llm/_torch/auto_deploy/config/default.yamltensorrt_llm/_torch/auto_deploy/custom_ops/attention/trtllm_attention.pytensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.pytensorrt_llm/_torch/auto_deploy/custom_ops/utils/torch_gather_logits.pytensorrt_llm/_torch/auto_deploy/llm.pytensorrt_llm/_torch/auto_deploy/llm_args.pytensorrt_llm/_torch/auto_deploy/shim/interface.pytensorrt_llm/_torch/auto_deploy/transform/library/_onnx_schemas.pytensorrt_llm/_torch/auto_deploy/utils/benchmark.pytests/unittest/auto_deploy/_utils_test/_model_test_utils.pytests/unittest/auto_deploy/multigpu/smoke/test_ad_build_small_multi.pytests/unittest/auto_deploy/singlegpu/custom_ops/test_resource_handlers.pytests/unittest/auto_deploy/singlegpu/shim/test_cached_sequence_interface.pytests/unittest/auto_deploy/singlegpu/shim/test_engine.pytests/unittest/auto_deploy/singlegpu/shim/test_llm_config.pytests/unittest/auto_deploy/singlegpu/smoke/test_ad_guided_decoding_regex.pytests/unittest/auto_deploy/singlegpu/smoke/test_ad_trtllm_sampler.pytests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_trtllm_attention_quant_fp8.pytests/unittest/auto_deploy/singlegpu/transformations/library/test_gated_delta_rule_cache.pytests/unittest/auto_deploy/singlegpu/transformations/library/test_kv_cache.pytests/unittest/auto_deploy/singlegpu/transformations/library/test_mrope_delta_cache.pytests/unittest/auto_deploy/singlegpu/transformations/library/test_torch_gated_delta_rule_cache.py
💤 Files with no reviewable changes (2)
- tests/unittest/auto_deploy/multigpu/smoke/test_ad_build_small_multi.py
- tensorrt_llm/_torch/auto_deploy/utils/benchmark.py
|
PR_Github #41497 [ run ] triggered by Bot. Commit: |
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Nothing imports this module — it is dead code cleaned up as part of the paperclip infra consolidation. Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
The comment was historical context about switching to graph mode. Transformers mode deprecation will be tracked separately. Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Centralizes the (max_seq_len + 1) * max_batch_size formula (a WAR for flashinfer issue NVIDIA#4504) into a single helper in _model_test_utils.py. Replaces hardcoded magic numbers (129 * 4) and inline formulas across 7 test files. Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
- Remove duplicate BB Vision/Multi-Modal section in ad-onboard-reviewer.md - Remove stale --benchmark.enabled flag from .vscode/launch.json - Update copyright year to 2025-2026 in _onnx_schemas.py Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
45ffa73 to
c44dda7
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #41502 [ run ] triggered by Bot. Commit: |
|
PR_Github #41502 [ run ] completed with state
|
Fixes: #12712
This includes infrastructure-related changes made during the one-week AutoDeploy model onboarding sprint. Specific custom model files and tests will follow in another PR.
Consists of changes from #12209 related to existing infrastructure. Does not introduce any new onboarded models.
Summary by CodeRabbit
Release Notes
New Features
Improvements
Bug Fixes
Removals
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.