[#11166][infra] AutoDeploy: improve test organization in CI and add overview doc#11291
[#11166][infra] AutoDeploy: improve test organization in CI and add overview doc#11291lucaslie wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
There was a problem hiding this comment.
If I understand correctly, this files controls parallel test execution. @chzblych , @QiJune can you help me update this correctly?
All tests in unittest/auto_deploy/singlegpu can be executed in parallel except for the smoke test subfolder unittest/auto_deploy/singlegpu/smoke
How do I correctly specify this?
Also we currently have our tests run in H100, B200, and A30. Is this correctly reflected in this file?
There was a problem hiding this comment.
@tongyuantongyu is likely better suited to answer that question.
There was a problem hiding this comment.
Parallel execution is opt-in, so you need to add entries for tests that can be paralleled:
- unittest/auto_deploy/singlegpu/compile
- unittest/auto_deploy/singlegpu/custom_ops
- unittest/auto_deploy/singlegpu/models
- unittest/auto_deploy/singlegpu/shim
- unittest/auto_deploy/singlegpu/smoke
- unittest/auto_deploy/singlegpu/transformations
- unittest/auto_deploy/singlegpu/utils
And then you'll need to determine the suitable parallelism for each of them on each GPU. I'd recommend only enable parallel execution if the runtime is too long and unacceptable.
@chzblych Are we still using B200 Bring Up Board (those reporting GPU model as "NVIDIA Graphics Device") to run our CI? Are we running any unittest listed in this file on GB200 or (G)B300?
There was a problem hiding this comment.
I see, thank you for the feedback. So maybe it's best to remove these entries all together and maybe revisit it later if we have issues with long runtimes?
There was a problem hiding this comment.
If I do decide to parallelize, can I leave out the other entries?
i.e. is this sufficient:
- unittest/auto_deploy/singlegpu/custom_ops
or do I need do add something like this
- unittest/auto_deploy/singlegpu/custom_ops,NVIDIA B200,4,
with the extra parts ,NVIDIA B200,4, ? This is the part I don't quite understand
There was a problem hiding this comment.
So maybe it's best to remove these entries all together and maybe revisit it later if we have issues with long runtimes?
Since there was unittest/_torch/auto_deploy/unit/singlegpu ... entry for B200, I guess we found that it was slow before. I suggest parallelize a testcase if
- It's running on some GPU model that we are short on resource (Likely Hopper/Blackwell ones)
- It's running for longer than 10min
- Enabling parallel can reduce runtime a lot
This is the part I don't quite understand
The table head can help you understand them:
unittest_case_name,gpu,parallel_factor,comment
gpu (NVIDIA B200) is the GPU model name.
parallel_factor (4) is the parallel you want for this test case.
comment is an explanation and can be empty.
|
/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast |
📝 WalkthroughWalkthroughThis change reorganizes the AutoDeploy test directory structure from Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/unittest/auto_deploy/singlegpu/transformations/library/test_rope_transformation.py (1)
1-1:⚠️ Potential issue | 🟡 MinorAdd NVIDIA copyright header.
This source file is missing the required NVIDIA copyright header. As per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header with the year of latest meaningful modification (2026).
📋 Proposed fix to add copyright header
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import pytesttests/unittest/auto_deploy/singlegpu/models/test_minimax_m2_patches.py (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd required NVIDIA copyright header.
This Python source file is missing the NVIDIA copyright header with the year of latest meaningful modification. As per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header.
📄 Proposed fix: add copyright header
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + """Testing module patches that enable export of MiniMax-M2 model.
🤖 Fix all issues with AI agents
In `@docs/source/features/auto_deploy/advanced/testing_strategy.md`:
- Around line 9-23: The fenced code blocks showing the ASCII diagram and the
tests directory tree are missing a language tag which triggers MD040; update the
opening triple-backtick of both blocks to include a language (use "text") so
they start with ```text (e.g., the ASCII diagram block and the
tests/unittest/auto_deploy/ directory-tree block), ensuring the code fences
contain only the block content and not diff markers or extra +/- characters.
In
`@tests/unittest/auto_deploy/multigpu/transformations/library/test_bmm_sharding.py`:
- Around line 35-45: The forward method's docstring lists Args
(selected_experts, routing_weights) that are not parameters of the forward(self,
hidden_states: torch.Tensor) signature; update the docstring in the forward
function to only document hidden_states (its shape and meaning) or if those
extra params are needed, add them to the forward signature; locate the forward
method in this test class (function name forward) and remove the stale
selected_experts and routing_weights entries from the Args section so the
docstring matches the actual parameters.
In
`@tests/unittest/auto_deploy/singlegpu/transformations/library/test_quantization.py`:
- Line 1: The module docstring is incorrect ("Tests for basic graph sharding");
update it to accurately describe this file's contents (quantization
transformation tests). Replace the top-level string in test_quantization.py with
a concise description such as "Tests for quantization transformations (FP8,
NVFP4, INT4 AWQ, GPTQ)" so it matches the test cases in this module.
🧹 Nitpick comments (7)
tests/unittest/auto_deploy/singlegpu/transformations/library/test_attention_matcher_hf.py (1)
74-78: Prefer a comment for this local helper instead of a docstring.This nested helper is local to the enclosing test function; per guidelines, a comment is preferred here.
Suggested change
- """Ensure that there is exactly one torch.ops.auto_deploy.torch_attention call in the graph. - - The call should have layout="bsnd". Also check that there is no repeat_kv pattern left. - """ + # Ensure exactly one torch.ops.auto_deploy.torch_attention call in the graph + # with layout="bsnd", and confirm no repeat_kv pattern remains.As per coding guidelines "Use comments in Python for code within a function, or interfaces that are local to a file".
tests/unittest/auto_deploy/singlegpu/models/test_deepseek_patches.py (1)
16-25: Use Google-style docstring (Args/Returns) for_load_layer_from_model.The docstring currently uses “Parameters/Returns”, which doesn’t match the required Google-style format.
✅ Suggested update
- """Loads a specific layer/module from a model without loading the entire model. - - Parameters: - model_name_or_path (str): Path or name of the pretrained model. - layer_name (str): Name of the layer to extract. - - Returns: - module: The specified layer/module if available, otherwise None. - """ + """Loads a specific layer/module from a model without loading the entire model. + + Args: + model_name_or_path (str): Path or name of the pretrained model. + layer_name (str): Name of the layer to extract. + + Returns: + module: The specified layer/module if available, otherwise None. + """As per coding guidelines, use Google-style docstrings for Python functions.
tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_rope_attention.py (2)
1-1: Consider updating the copyright year to 2026.Since this file is being modified in 2026, the copyright year should be updated to reflect the year of latest meaningful modification.
Proposed fix
-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025-2026, NVIDIA CORPORATION. All rights reserved.As per coding guidelines: "All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification".
202-204: Pre-existing bug: Assert messages are missing f-string prefixes.While not part of this PR's changes, I noticed that the assert messages on lines 202, 204, and 214 use
{len(...)}syntax without thefprefix, so the variable values won't be interpolated. This would make debugging test failures harder.Proposed fix (optional, out of scope)
- assert len(fused_nodes) == 1, "Expected 1 AttentionPlugin node, got {len(fused_nodes)}" + assert len(fused_nodes) == 1, f"Expected 1 AttentionPlugin node, got {len(fused_nodes)}" input_nodes = gm_transformed.graph.find_nodes(op="placeholder") - assert len(input_nodes) == 5, "Expected 5 input nodes, got {len(input_nodes)}" + assert len(input_nodes) == 5, f"Expected 5 input nodes, got {len(input_nodes)}"- assert len(out_node.args[0]) == 2, "Expected 2 output nodes, got {len(out_node.args[0])}" + assert len(out_node.args[0]) == 2, f"Expected 2 output nodes, got {len(out_node.args[0])}"Also applies to: 214-214
tests/unittest/auto_deploy/singlegpu/transformations/library/test_rope_transformation.py (1)
31-34: Consider using proper Google-style docstring format.The docstring could be improved to follow Google-style conventions more closely by using a proper "Returns:" section header.
📝 Suggested improvement
- """Compute the frequency tensor for the complex multiplication RoPE variant. + """Compute the frequency tensor for the complex multiplication RoPE variant. - Returns a complex tensor of shape (seq_len, head_dim//2). + Returns: + torch.Tensor: A complex tensor of shape (seq_len, head_dim//2). """tests/unittest/auto_deploy/singlegpu/models/test_minimax_m2_patches.py (1)
31-64: Narrow the exception handling to specific error types.The try-except block catches generic
Exception, which is broader than necessary. Consider catching only the specific exceptions that can occur during model loading and configuration. As per coding guidelines, limit the except clause to the smallest set of errors possible.♻️ Proposed refactor: specify expected exceptions
- except Exception as e: + except (OSError, ValueError, ImportError, RuntimeError) as e: print(f"Error extracting layer: {e}") return Nonetests/unittest/auto_deploy/singlegpu/custom_ops/triton_kernels/test_triton_moe.py (1)
17-21: Use Google‑style docstring sections (Args/Returns) here.
This keeps the docstring Sphinx‑parsable and consistent with the rest of the codebase.♻️ Proposed docstring update
- """Reference implementation based on the provided previous algorithm. - - Produces used-region outputs (excluding sentinel-E blocks) for ground-truth comparison. - Returns: (sorted_token_ids_used[int64], expert_ids_used[int32], num_tokens_post_padded[int]) - """ + """Reference implementation based on the provided previous algorithm. + + Args: + topk_ids: Top-k expert IDs per token. + M: Number of tokens. + E: Number of experts. + top_k: Top-k value. + block_size_m: Tokens per block. + + Returns: + Tuple[torch.Tensor, torch.Tensor, int]: Sorted token IDs (used region), + expert IDs for used blocks, and number of tokens post padding. + """As per coding guidelines: Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx.
tests/unittest/auto_deploy/multigpu/transformations/library/test_bmm_sharding.py
Show resolved
Hide resolved
tests/unittest/auto_deploy/singlegpu/transformations/library/test_quantization.py
Outdated
Show resolved
Hide resolved
|
PR_Github #34852 [ run ] triggered by Bot. Commit: |
|
PR_Github #34852 [ run ] completed with state
|
… add overview doc Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
1ce30e3 to
94cb029
Compare
|
/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast |
1 similar comment
|
/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast |
|
PR_Github #35022 [ run ] triggered by Bot. Commit: |
|
PR_Github #35022 [ run ] completed with state
|
Summary by CodeRabbit
Release Notes
Documentation
Refactor
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.