[None][fix] Enable CUDA core fast path for SM121 (DGX Spark) by mihai-chiorean · Pull Request #12705 · NVIDIA/TensorRT-LLM

mihai-chiorean · 2026-04-02T18:50:40Z

Summary

enable_cuda_core in NVFP4Linear.__init__ and _trtllm_fp8_prequant_linear_core() gate the CUDA core scaled-mm fast path for small M dimensions (M <= 8). Both checks match SM89 (8,9) and SM120 (12,0) but not SM121 (12,1), leaving the fast path dead on DGX Spark GB10.

This adds (12,1) to both checks so SM121 gets the same fast path as SM120.

Note on device-0 hardcoding

Both linear.py:2571 (torch.device("cuda:0")) and quant.py:111 (torch.cuda.get_device_capability(0)) query device 0 at init time rather than the device the tensors will actually live on. This is a pre-existing issue that affects SM89 and SM120 equally — fixing it requires moving the check to dispatch time, which is a larger refactor beyond this PR scope.

Test plan

Verified enable_cuda_core=True on DGX Spark GB10 (SM121) after fix
CI: no regression on SM89/SM120

coderabbitai · 2026-04-02T18:54:04Z

📝 Walkthrough

Walkthrough

Updated CUDA-core enablement logic in the Linear module to recognize additional GPU architectures. The condition now enables CUDA cores for both (major=12, minor=0) and (major=12, minor=1) GPU capabilities, extending support beyond the previously supported (12,0) architecture.

Changes

Cohort / File(s)	Summary
CUDA Core Architecture Support `tensorrt_llm/_torch/modules/linear.py`	Modified `Linear.__init__` to include GPU architecture `(12,1)` alongside existing `(12,0)` in the CUDA-core enablement condition.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title accurately summarizes the main change: enabling CUDA core fast path support for SM121 (DGX Spark), which is the core objective of the PR.
Description check	✅ Passed	The PR description includes a clear summary of the issue, solution, and testing approach, though the Description and Test Coverage sections are not explicitly separated with headers as the template suggests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tensorrt_llm/_torch/modules/linear.py (1)
2572-2574: Centralize the CUDA-core capability allowlist to avoid cross-file drift.

This SM121 enablement is correct, but the same capability check is duplicated elsewhere and is already inconsistent (tensorrt_llm/_torch/auto_deploy/custom_ops/quantization/quant.py:109-120 still excludes (12,1)). Consider moving the allowlist into a shared helper/constant and reusing it in both places.
Suggested direction
+# e.g., in a shared module
+CUDA_CORE_CAPABILITIES = {(8, 9), (12, 0), (12, 1)}
+
 # in Linear.__init__
 self.enable_cuda_core = False
 if torch.cuda.is_available():
     capability = torch.cuda.get_device_capability(torch.device('cuda:0'))
-    self.enable_cuda_core = (capability[0] == 8 and capability[1] == 9) \
-        or (capability[0] == 12 and capability[1] in (0, 1))
+    self.enable_cuda_core = capability in CUDA_CORE_CAPABILITIES
Apply the same shared constant/helper in tensorrt_llm/_torch/auto_deploy/custom_ops/quantization/quant.py to keep behavior aligned.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/linear.py` around lines 2572 - 2574, Centralize
the CUDA core allowlist by adding a shared constant or helper (e.g.,
CUDA_CORE_ALLOWLIST or is_cuda_core_supported(capability)) and replace the
inline capability check in LinearModule (where enable_cuda_core is set using
capability) with a call/reference to that helper; then update the quantization
code in tensorrt_llm/_torch/auto_deploy/custom_ops/quantization/quant.py to use
the same helper so (12,1) is included consistently across both places. Ensure
the shared symbol explicitly contains the tuples {(8,9), (12,0), (12,1)} (or
logic that yields the same) and update references in the LinearModule
(enable_cuda_core) and the quant module to use that single source of truth.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/_torch/modules/linear.py`:
- Around line 2572-2574: Centralize the CUDA core allowlist by adding a shared
constant or helper (e.g., CUDA_CORE_ALLOWLIST or
is_cuda_core_supported(capability)) and replace the inline capability check in
LinearModule (where enable_cuda_core is set using capability) with a
call/reference to that helper; then update the quantization code in
tensorrt_llm/_torch/auto_deploy/custom_ops/quantization/quant.py to use the same
helper so (12,1) is included consistently across both places. Ensure the shared
symbol explicitly contains the tuples {(8,9), (12,0), (12,1)} (or logic that
yields the same) and update references in the LinearModule (enable_cuda_core)
and the quant module to use that single source of truth.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e59cfc02-a160-49f6-a50f-c9d5c7c47013

📥 Commits

Reviewing files that changed from the base of the PR and between 11c40bb and 81ea2be.

📒 Files selected for processing (1)

tensorrt_llm/_torch/modules/linear.py

The enable_cuda_core check in NVFP4Linear only matches SM89 and SM120 but not SM121 (DGX Spark GB10). CudaCoreNVFP4Runner.MIN_SM_VERSION is 100 so SM121 qualifies, but the linear module early-exit optimization bypasses the autotuner for small M dimensions (M <= 8) and this path was dead on SM121. Add capability (12, 1) to the check. Signed-off-by: Mihai Chiorean <mihai.v.chiorean@gmail.com>

mihai-chiorean requested a review from a team as a code owner April 2, 2026 18:50

mihai-chiorean requested a review from HuiGao-NV April 2, 2026 18:50

mihai-chiorean mentioned this pull request Apr 2, 2026

[Feature]: Document TRITON MoE backend as recommended for SM120/SM121 (DGX Spark) #12706

Open

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

mihai-chiorean force-pushed the fix/enable-cuda-core-sm121 branch from 81ea2be to ee80a41 Compare April 2, 2026 18:59

mihai-chiorean requested a review from a team as a code owner April 2, 2026 18:59

mihai-chiorean requested a review from galagam April 2, 2026 18:59

svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Apr 2, 2026

mihai-chiorean changed the title ~~[None][fix] Enable CUDA core NVFP4 fast path for SM121 (DGX Spark)~~ [None][fix] Enable CUDA core fast path for SM121 (DGX Spark) Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Enable CUDA core fast path for SM121 (DGX Spark)#12705

[None][fix] Enable CUDA core fast path for SM121 (DGX Spark)#12705
mihai-chiorean wants to merge 1 commit intoNVIDIA:mainfrom
mihai-chiorean:fix/enable-cuda-core-sm121

mihai-chiorean commented Apr 2, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 2, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mihai-chiorean commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Note on device-0 hardcoding

Test plan

Uh oh!

coderabbitai bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mihai-chiorean commented Apr 2, 2026 •

edited

Loading

coderabbitai bot commented Apr 2, 2026 •

edited

Loading