[PyTorch] Fix cross_entropy_forward stride guard for non-contiguous input by Bias92 · Pull Request #2746 · NVIDIA/TransformerEngine

Bias92 · 2026-03-09T15:13:45Z

The stride guard in cross_entropy_forward only checks stride(-1) != 1,
which misses transposed tensors where stride(-1) == 1 but stride(-2) != shape[-1].
The Triton kernel then uses the wrong row stride and produces silently incorrect results.

Added stride(-2) check, same approach as the backward fix in #2402.

Fixes #2734

Before fix:
Non-contiguous: [2.0794, 2.0794, 2.0794, 2.0794] ← wrong (same values)
Contiguous: [4.1277, 3.7957, 2.1120, 2.5712] ← correct

After fix:
Both return [
4.1277, 3.7957, 2.1120, 2.5712] ✓

Signed-off-by: Bias92 <pewpewplay315@gmail.com>

greptile-apps · 2026-03-09T15:16:41Z

Greptile Summary

This PR fixes a silent correctness bug in cross_entropy_forward where non-contiguous tensors with stride(-1) == 1 but stride(-2) != shape[-1] (e.g., a transposed tensor) bypassed the existing contiguity guard and caused the Triton kernel to use the wrong row stride, producing silently incorrect loss values.

Key changes:

transformer_engine/pytorch/triton/cross_entropy.py: Extends the stride guard from stride(-1) != 1 to also check stride(-2) != shape[-1], matching the pattern used in PR Make grad_output contiguous in cross_entropy.py #2402's backward fix. When triggered, the input is made contiguous before passing to the Triton kernels, which also implicitly fixes the backward pass.
tests/pytorch/test_parallel_cross_entropy.py: Adds regression test test_non_contiguous_transposed_input that creates a transposed (B, SQ, V) tensor, verifies the preconditions (stride(-1)==1, stride(-2) != shape[-1]), and validates that forward loss matches the result from an explicitly contiguous input.

Confidence Score: 5/5

This PR is safe to merge — it is a minimal, targeted fix applying a well-established guard pattern already present in the backward pass.
The one-line fix is correct, consistent with existing patterns in the codebase (PR Make grad_output contiguous in cross_entropy.py #2402), and the regression test properly exercises the bug path. The fix correctly catches the missed stride pattern and prevents silent correctness bugs. No safety concerns identified.
No files require special attention.

_{Last reviewed commit: c03ceb0}

greptile-apps · 2026-03-09T15:16:46Z

transformer_engine/pytorch/triton/cross_entropy.py


    # ensure _input and target are contiguous in the last dimension
-    if _input.stride(-1) != 1:
+    if _input.stride(-1) != 1 or _input.stride(-2) != _input.shape[-1]:


No regression test for the fixed scenario

The fix correctly catches tensors where stride(-1) == 1 but stride(-2) != shape[-1] (e.g., a tensor of shape (SQ, B, V) transposed to (B, SQ, V) via .transpose(0, 1)). However, no test is added to cover this specific case.

The existing test_swapped_input in tests/pytorch/test_parallel_cross_entropy.py creates a naturally-shaped (SQ, batch, vocab) tensor (contiguous, stride(-2) == V), so it does not exercise the bug path described in the PR.

A minimal regression test would look like:

def test_non_contiguous_transposed_input(self): """Regression test for stride(-2) != shape[-1] on non-contiguous input.""" self.generate_iters(3) self.generate_infra(True, 0) for _ in range(self.iters): batch, SQ, vocab = 2, 64, 1024 # shape (SQ, batch, vocab) transposed → (batch, SQ, vocab) # stride(-1)==1 but stride(-2)==batch*vocab != vocab ← the old guard missed this x = torch.rand((SQ, batch, vocab), dtype=torch.float32, device="cuda").transpose(0, 1) assert x.stride(-1) == 1 and x.stride(-2) != x.shape[-1] # confirm the guard is needed self.input_test = x ... # drive through one_iteration_test

Without a test, this class of regression can silently reappear.

Bias92 · 2026-03-09T15:18:05Z

Tested locally on RTX 4060 Ti (WSL2):

Reproduced the bug with transposed input (Match: False)
Verified fix produces correct results (Match: True)
Checked no other incomplete stride guards in the file

Full test suite needs CI — local editable build had C++ extension issues on WSL2.

Signed-off-by: Bias92 <pewpewplay315@gmail.com>

for more information, see https://pre-commit.ci

ptrendx · 2026-03-09T18:49:53Z

/te-ci pytorch

ptrendx · 2026-03-09T18:50:58Z

Thank you for your contribution @Bias92. I will merge once the CI passes.

Fix cross_entropy_forward stride guard for non-contiguous input

cde3c91

Signed-off-by: Bias92 <pewpewplay315@gmail.com>

greptile-apps bot reviewed Mar 9, 2026

View reviewed changes

Bias92 and others added 2 commits March 10, 2026 00:20

Add regression test for non-contiguous transposed input

bbe845b

Signed-off-by: Bias92 <pewpewplay315@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c03ceb0

for more information, see https://pre-commit.ci

ptrendx approved these changes Mar 9, 2026

View reviewed changes

yaox12 merged commit e6d97ff into NVIDIA:main Mar 10, 2026
20 of 24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Fix cross_entropy_forward stride guard for non-contiguous input#2746

[PyTorch] Fix cross_entropy_forward stride guard for non-contiguous input#2746
yaox12 merged 3 commits intoNVIDIA:mainfrom
Bias92:fix/cross-entropy-forward-stride

Bias92 commented Mar 9, 2026

Uh oh!

greptile-apps bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Mar 9, 2026

Uh oh!

Bias92 commented Mar 9, 2026

Uh oh!

ptrendx commented Mar 9, 2026

Uh oh!

ptrendx commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Bias92 commented Mar 9, 2026

Uh oh!

greptile-apps bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Uh oh!

greptile-apps bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Bias92 commented Mar 9, 2026

Uh oh!

ptrendx commented Mar 9, 2026

Uh oh!

ptrendx commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Mar 9, 2026 •

edited

Loading