[Feat]: add NPU fused operators (RMSNorm, RoPE, SwiGLU, SDPA) by ys2025-AI · Pull Request #194 · modelscope/twinkle

ys2025-AI · 2026-05-18T12:12:25Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Extends Twinkle's NPU support from a basic MoE GMM patch to a full fused-operator suite (RMSNorm, RoPE, SwiGLU, SDPA) for Ascend hardware.

Experiment results

Atlas 900 A2 (8× NPU) | Qwen3-30B-A3B-Instruct-2507 | LoRA r=8, batch=16, 188 steps | Dataset GSM8K_ZH

Metric	Baseline	This PR	Delta
Total	544 s	503 s	+7.5%
Training (step 10–180)	465 s	404 s	+13.1%
Loss / GradNorm	—	—	<< 0.01

gemini-code-assist

Code Review

This pull request introduces comprehensive NPU hardware acceleration support for Ascend devices by implementing fused operators (RMSNorm, RoPE, SwiGLU, and SDPA) and monkey-patching logic for specific model families like Qwen. It also refactors the NPU patching mechanism to be applied automatically when an NPU device is detected. Review feedback focuses on improving error handling by logging tracebacks for broad exception catches and restoring type hints and assertions that were removed during the refactoring of the MoE grouped matrix multiplication functions.

gemini-code-assist · 2026-05-18T12:15:08Z

-        assert x.size(1) == weight_ekn.size(1), (
-            f'input dim mismatch: x.shape={tuple(x.shape)}, weight_ekn.shape={tuple(weight_ekn.shape)}')
-
+    def forward(ctx, x, group_list, weight_ekn):


The type hints and assertions from the previous version of GmmFunction.forward have been removed. These are valuable for static analysis, code clarity, and preventing runtime errors. Please consider restoring the type hints and relevant assertions for tensor shapes and dimensions to maintain code quality.

Suggested change

def forward(ctx, x, group_list, weight_ekn):

def forward(ctx, x: torch.Tensor, group_list: torch.Tensor, weight_ekn: torch.Tensor):

gemini-code-assist · 2026-05-18T12:15:08Z

-    assert weight_ekn.size(0) == offs.numel(), (
-        f'weight_ekn.size(0)={weight_ekn.size(0)} != offs.numel()={offs.numel()}')
-
+def _grouped_mm_npu(input, weight_ekn, offs):


Similar to GmmFunction, the type hints and assertions have been removed from this function. Restoring them would improve code quality and make it easier to understand the expected inputs and outputs.

Suggested change

def _grouped_mm_npu(input, weight_ekn, offs):

def _grouped_mm_npu(input: torch.Tensor, weight_ekn: torch.Tensor, offs: torch.Tensor) -> torch.Tensor:

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

tastelikefeet · 2026-05-19T03:28:09Z

+
+    # Priority 2: Fallback to global NPU availability
+    try:
+        if hasattr(torch, 'npu') and torch.npu.is_available():


有个torch_util，建议复用代码

ys2025-AI added 2 commits May 18, 2026 19:54

Update __init__.py

b5aedc7

Update monkey_patch_npu.py

85c677a

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

ys2025-AI and others added 3 commits May 18, 2026 20:25

Update src/twinkle/kernel/__init__.py

1ef0e38

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update src/twinkle/kernel/__init__.py

924d17c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge branch 'modelscope:main' into main

8472a43

tastelikefeet reviewed May 19, 2026

View reviewed changes

tastelikefeet approved these changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: add NPU fused operators (RMSNorm, RoPE, SwiGLU, SDPA)#194

[Feat]: add NPU fused operators (RMSNorm, RoPE, SwiGLU, SDPA)#194
ys2025-AI wants to merge 5 commits into
modelscope:mainfrom
ys2025-AI:main

ys2025-AI commented May 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot May 18, 2026

Uh oh!

gemini-code-assist Bot May 18, 2026

Uh oh!

tastelikefeet May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def forward(ctx, x, group_list, weight_ekn):
	def forward(ctx, x: torch.Tensor, group_list: torch.Tensor, weight_ekn: torch.Tensor):

	def _grouped_mm_npu(input, weight_ekn, offs):
	def _grouped_mm_npu(input: torch.Tensor, weight_ekn: torch.Tensor, offs: torch.Tensor) -> torch.Tensor:

Conversation

ys2025-AI commented May 18, 2026

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

tastelikefeet May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants