Skip to content

Conversation

@nirvedhmeshram
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram commented Dec 12, 2025

Currently we can run into undefined behavior in case of signed wrap on incrementing index values when prefetching. This PR adds a no signed wrap attribute to prevent that.

@llvmbot
Copy link
Member

llvmbot commented Dec 12, 2025

@llvm/pr-subscribers-mlir

Author: Nirvedh Meshram (nirvedhmeshram)

Changes

Currently we can run into undefined behavior in case of signed wrap on index values when prefetching. This PR adda no signed wrap attribute to prevent that.


Full diff: https://github.com/llvm/llvm-project/pull/172066.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp (+12-6)
  • (modified) mlir/test/Dialect/SCF/loop-pipelining.mlir (+26-26)
diff --git a/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp b/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
index 7e7fba43fe9e0..6b87dcaf94b67 100644
--- a/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
@@ -283,7 +283,8 @@ LogicalResult LoopPipelinerInternal::emitPrologue(RewriterBase &rewriter) {
           arith::MulIOp::create(
               rewriter, loc, step,
               arith::ConstantOp::create(rewriter, loc,
-                                        rewriter.getIntegerAttr(t, i))));
+                                        rewriter.getIntegerAttr(t, i))),
+          arith::IntegerOverflowFlags::nsw);
       predicates[i] = arith::CmpIOp::create(rewriter, loc,
                                             arith::CmpIPredicate::slt, iv, ub);
     }
@@ -296,7 +297,8 @@ LogicalResult LoopPipelinerInternal::emitPrologue(RewriterBase &rewriter) {
         arith::MulIOp::create(
             rewriter, loc, step,
             arith::ConstantOp::create(rewriter, loc,
-                                      rewriter.getIntegerAttr(t, i))));
+                                      rewriter.getIntegerAttr(t, i))),
+        arith::IntegerOverflowFlags::nsw);
     setValueMapping(forOp.getInductionVar(), iv, i);
     for (Operation *op : opOrder) {
       if (stages[op] > i)
@@ -521,7 +523,8 @@ LogicalResult LoopPipelinerInternal::createKernel(
                 rewriter, forOp.getLoc(),
                 rewriter.getIntegerAttr(t, maxStage - stages[op])));
         Value iv = arith::AddIOp::create(rewriter, forOp.getLoc(),
-                                         newForOp.getInductionVar(), offset);
+                                         newForOp.getInductionVar(), offset,
+                                         arith::IntegerOverflowFlags::nsw);
         nestedNewOp->setOperand(operand->getOperandNumber(), iv);
         rewriter.setInsertionPointAfter(newOp);
         continue;
@@ -667,7 +670,8 @@ LoopPipelinerInternal::emitEpilogue(RewriterBase &rewriter,
                                            createConst(-1));
 
   Value rangeDiff = arith::SubIOp::create(rewriter, loc, ub, lb);
-  Value rangeIncrStep = arith::AddIOp::create(rewriter, loc, rangeDiff, step);
+  Value rangeIncrStep = arith::AddIOp::create(rewriter, loc, rangeDiff, step,
+                                              arith::IntegerOverflowFlags::nsw);
   Value rangeDecr =
       arith::AddIOp::create(rewriter, loc, rangeIncrStep, stepDecr);
   Value totalIterations =
@@ -686,12 +690,14 @@ LoopPipelinerInternal::emitEpilogue(RewriterBase &rewriter,
   for (int64_t i = 1; i <= maxStage; i++) {
     // newLastIter = lb + step * iterI
     Value newlastIter = arith::AddIOp::create(
-        rewriter, loc, lb, arith::MulIOp::create(rewriter, loc, step, iterI));
+        rewriter, loc, lb, arith::MulIOp::create(rewriter, loc, step, iterI),
+        arith::IntegerOverflowFlags::nsw);
 
     setValueMapping(forOp.getInductionVar(), newlastIter, i);
 
     // increment to next iterI
-    iterI = arith::AddIOp::create(rewriter, loc, iterI, one);
+    iterI = arith::AddIOp::create(rewriter, loc, iterI, one,
+                                  arith::IntegerOverflowFlags::nsw);
 
     if (dynamicLoop) {
       // Disable stages when `i` is greater than total_iters.
diff --git a/mlir/test/Dialect/SCF/loop-pipelining.mlir b/mlir/test/Dialect/SCF/loop-pipelining.mlir
index 86af637fc05d7..9828df0a0f3e8 100644
--- a/mlir/test/Dialect/SCF/loop-pipelining.mlir
+++ b/mlir/test/Dialect/SCF/loop-pipelining.mlir
@@ -14,7 +14,7 @@
 //  CHECK-SAME:     step %[[C1]] iter_args(%[[LARG:.*]] = %[[L0]]) -> (f32) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[LARG]], %{{.*}} : f32
 //  CHECK-NEXT:     memref.store %[[ADD0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[LR:.*]] = memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[LR]] : f32
 //  CHECK-NEXT:   }
@@ -87,7 +87,7 @@ func.func @iteration_lt_stage(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = scf.execute_region
 //  CHECK-NEXT:       arith.addf %[[LARG]], %{{.*}} : f32
 //       CHECK:     memref.store %[[ADD0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[LR:.*]] = scf.execute_region
 //  CHECK-NEXT:       memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //       CHECK:     scf.yield %[[LR]] : f32
@@ -135,7 +135,7 @@ func.func @simple_pipeline_region(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-SAME:     step %[[C3]] iter_args(%[[LARG0:.*]] = %[[L0]], %[[LARG1:.*]] = %[[L1]]) -> (f32, f32) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[LARG0]], %{{.*}} : f32
 //  CHECK-NEXT:     memref.store %[[ADD0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C6]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C6]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[LR:.*]] = memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[LARG1]], %[[LR]] : f32, f32
 //  CHECK-NEXT:   }
@@ -175,7 +175,7 @@ func.func @simple_pipeline_step(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-SAME:     %[[LARG:.*]] = %[[L1]]) -> (f32, f32) {
 //  CHECK-NEXT:     memref.store %[[ADDARG]], %[[R]][%[[IV]]] : memref<?xf32>
 //  CHECK-NEXT:     %[[ADD1:.*]] = arith.addf %[[LARG]], %{{.*}} : f32
-//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] : index
+//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L3:.*]] = memref.load %[[A]][%[[IV2]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[ADD1]], %[[L3]] : f32, f32
 //  CHECK-NEXT:   }
@@ -224,7 +224,7 @@ func.func @simple_pipeline_step(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  NOEPILOGUE-NEXT:     } else {
 //  NOEPILOGUE-NEXT:       scf.yield %[[CF]] : f32
 //  NOEPILOGUE-NEXT:     }
-//  NOEPILOGUE-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] : index
+//  NOEPILOGUE-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] overflow<nsw> : index
 //  NOEPILOGUE-NEXT:     %[[L3:.*]] = scf.if %[[S0]] -> (f32) {
 //  NOEPILOGUE-NEXT:       %[[PL:.*]] = memref.load %[[A]][%[[IV2]]] : memref<?xf32>
 //  NOEPILOGUE-NEXT:       scf.yield %[[PL]] : f32
@@ -274,7 +274,7 @@ func.func @three_stage(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-SAME:     %[[LA3:.*]] = %[[L3]]) -> (f32, f32, f32, f32) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[LA0]], %{{.*}} : f32
 //  CHECK-NEXT:     memref.store %[[ADD0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV4:.*]] = arith.addi %[[IV]], %[[C4]] : index
+//  CHECK-NEXT:     %[[IV4:.*]] = arith.addi %[[IV]], %[[C4]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L4:.*]] = memref.load %[[A]][%[[IV4]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[LA1]], %[[LA2]], %[[LA3]], %[[L4]] : f32, f32, f32, f32
 //  CHECK-NEXT:   }
@@ -326,7 +326,7 @@ func.func @long_liverange(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-NEXT:     %[[ADD2:.*]] = arith.addf %[[LA2]], %{{.*}} : f32
 //  CHECK-NEXT:     %[[MUL1:.*]] = arith.mulf %[[ADDARG1]], %[[LA1]] : f32
 //  CHECK-NEXT:     memref.store %[[MULARG0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C3]] : index
+//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C3]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L3:.*]] = memref.load %[[A]][%[[IV3]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[LA2]], %[[L3]], %[[ADD2]], %[[MUL1]] : f32, f32, f32, f32
 //  CHECK-NEXT:   }
@@ -379,7 +379,7 @@ func.func @multiple_uses(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-NEXT:     %[[MUL1:.*]] = scf.execute_region
 // arith.mulf %[[ADDARG1]], %[[LA1]] : f32
 //       CHECK:     memref.store %[[MULARG0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C3]] : index
+//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C3]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L3:.*]] = memref.load %[[A]][%[[IV3]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[LA2]], %[[L3]], %[[ADD2]], %[[MUL1]] : f32, f32, f32, f32
 //  CHECK-NEXT:   }
@@ -425,7 +425,7 @@ func.func @region_multiple_uses(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-SAME:     step %[[C1]] iter_args(%[[C:.*]] = %[[CSTF]],
 //  CHECK-SAME:     %[[LARG:.*]] = %[[L0]]) -> (f32, f32) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[LARG]], %[[C]] : f32
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L1:.*]] = memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[ADD0]], %[[L1]] : f32, f32
 //  CHECK-NEXT:   }
@@ -464,7 +464,7 @@ func.func @loop_carried(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-SAME:     %[[ADDARG:.*]] = %[[ADD0]], %[[LARG:.*]] = %[[L1]]) -> (f32, f32, f32) {
 //  CHECK-NEXT:     %[[MUL0:.*]] = arith.mulf %[[ADDARG]], %[[CSTF]] : f32
 //  CHECK-NEXT:     %[[ADD1:.*]] = arith.addf %[[LARG]], %[[MUL0]] : f32
-//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] : index
+//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L2:.*]] = memref.load %[[A]][%[[IV2]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[MUL0]], %[[ADD1]], %[[L2]] : f32, f32, f32
 //  CHECK-NEXT:   }
@@ -509,7 +509,7 @@ func.func @backedge_different_stage(%A: memref<?xf32>) -> f32 {
 //       CHECK:     %[[MUL0:.*]] = arith.mulf %[[ADDARG]], %[[CSTF]] : f32
 //       CHECK:     %[[ADD1:.*]] = scf.execute_region
 //  CHECK-NEXT:       arith.addf %[[LARG]], %[[MUL0]] : f32
-//       CHECK:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] : index
+//       CHECK:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] overflow<nsw> : index
 //       CHECK:     %[[L2:.*]] = scf.execute_region
 //  CHECK-NEXT:       memref.load %[[A]][%[[IV2]]] : memref<?xf32>
 //       CHECK:     scf.yield %[[MUL0]], %[[ADD1]], %[[L2]] : f32, f32, f32
@@ -558,7 +558,7 @@ func.func @region_backedge_different_stage(%A: memref<?xf32>) -> f32 {
 //  CHECK-SAME:     %[[LARG:.*]] = %[[L0]]) -> (f32, f32) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[LARG]], %[[C]] : f32
 //  CHECK-NEXT:     %[[MUL0:.*]] = arith.mulf %[[ADD0]], %[[CSTF]] : f32
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L2:.*]] = memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[MUL0]], %[[L2]] : f32, f32
 //  CHECK-NEXT:   }
@@ -603,11 +603,11 @@ func.func @backedge_same_stage(%A: memref<?xf32>) -> f32 {
 // CHECK:     %[[CV:.+]] = memref.subview %[[ARG2]]
 // CHECK:     linalg.generic
 // CHECK-SAME:  ins(%[[IA]], %[[IB]], %{{.*}} : {{.*}}) outs(%[[CV]] :
-// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]]
+// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]] overflow<nsw>
 // CHECK:     %[[ASV:.+]] = memref.subview %[[ARG0]][%[[NEXT]]] [8] [1] :
-// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]] :
+// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> :
 // CHECK:     %[[BSV:.+]] = memref.subview %[[ARG1]][%[[NEXT]]] [8] [1] :
-// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]] :
+// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> :
 // CHECK:     %[[BUFIDX:.+]] = affine.apply
 // CHECK:     %[[APROSV:.+]] = memref.subview %[[APRO]][%[[BUFIDX]], 0] [1, 8] [1, 1] :
 // CHECK:     %[[BPROSV:.+]] = memref.subview %[[BPRO]][%[[BUFIDX]], 0] [1, 8] [1, 1] :
@@ -679,10 +679,10 @@ func.func @pipeline_op_with_region(%A: memref<?xf32>, %B: memref<?xf32>, %result
 //  CHECK-NEXT:   %[[R:.*]]:3 = scf.for %[[IV:.*]] = %[[C0]] to %[[C3]]
 //  CHECK-SAME:     step %[[C1]] iter_args(%[[C:.*]] = %[[CSTF]],
 //  CHECK-SAME:     %[[ARG1:.*]] = %[[L0]], %[[ARG2:.*]] = %[[L1]]) -> (f32, f32, f32) {
-//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L2:.*]] = memref.load %[[A]][%[[IV2]]] : memref<?xf32>
 //  CHECK-NEXT:     %[[MUL0:.*]] = arith.mulf %[[C]], %[[ARG1]] : f32
-//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[IV4:.*]] = arith.addi %[[IV3]], %[[C1]] : index
 //  CHECK-NEXT:     %[[L3:.*]] = memref.load %[[A]][%[[IV4]]] : memref<?xf32>
 //  CHECK-NEXT:     %[[MUL1:.*]] = arith.mulf %[[ARG2]], %[[MUL0]] : f32
@@ -778,9 +778,9 @@ func.func @stage_0_value_escape(%A: memref<?xf32>, %result: memref<?xf32>, %ub:
 //      NOEPILOGUE: %[[P_I0:.+]] = arith.cmpi slt, %[[LB]], %[[UB]] : index
 //      NOEPILOGUE: %[[L0:.+]] = scf.if %[[P_I0]] -> (f32) {
 // NOEPILOGUE-NEXT:   memref.load %[[A]][%[[LB]]] : memref<?xf32>
-//      NOEPILOGUE: %[[IV1:.+]] = arith.addi %[[LB]], %[[STEP]] : index
+//      NOEPILOGUE: %[[IV1:.+]] = arith.addi %[[LB]], %[[STEP]] overflow<nsw> : index
 //      NOEPILOGUE: %[[P_I1:.+]] = arith.cmpi slt, %[[IV1]], %[[UB]] : index
-//      NOEPILOGUE: %[[IV1_2:.+]] = arith.addi %[[LB]], %[[STEP]] : index
+//      NOEPILOGUE: %[[IV1_2:.+]] = arith.addi %[[LB]], %[[STEP]] overflow<nsw> : index
 //      NOEPILOGUE: %[[V0:.+]] = scf.if %[[P_I0]] -> (f32) {
 // NOEPILOGUE-NEXT:   arith.addf %[[L0]], %[[CSTF]] : f32
 //      NOEPILOGUE: %[[L1:.+]] = scf.if %[[P_I1]] -> (f32) {
@@ -795,7 +795,7 @@ func.func @stage_0_value_escape(%A: memref<?xf32>, %result: memref<?xf32>, %ub:
 //      NOEPILOGUE:   %[[V2:.+]] = scf.if %[[P_I3]] -> (f32) {
 //      NOEPILOGUE:     arith.addf %[[L2]], %[[CSTF]] : f32
 //      NOEPILOGUE:   %[[IT4:.+]] = arith.muli %[[STEP]], %[[C2]] : index
-//      NOEPILOGUE:   %[[IV3:.+]] = arith.addi %[[IV2]], %[[IT4]] : index
+//      NOEPILOGUE:   %[[IV3:.+]] = arith.addi %[[IV2]], %[[IT4]] overflow<nsw> : index
 //      NOEPILOGUE:   %[[L3:.+]] = scf.if %[[P_I2]] -> (f32) {
 //      NOEPILOGUE:     memref.load %[[A]][%[[IV3]]] : memref<?xf32>
 //      NOEPILOGUE:   scf.yield %[[V2]], %[[L3]] : f32, f32
@@ -824,11 +824,11 @@ func.func @stage_0_value_escape(%A: memref<?xf32>, %result: memref<?xf32>, %ub:
 //        CHECK:   %[[SUBI_17:.*]] = arith.subi %[[DIVSI_15]], %[[C2]]
 //        CHECK:   %[[MAXSI_18:.*]] = arith.maxsi %[[SUBI_17]], %[[C0]]
 //        CHECK:   %[[MULI_19:.*]] = arith.muli %[[STEP]], %[[MAXSI_18]]
-//        CHECK:   %[[ADDI_20:.*]] = arith.addi %[[LB]], %[[MULI_19]]
-//        CHECK:   %[[ADDI_21:.*]] = arith.addi %[[MAXSI_18]], %[[C1]]
+//        CHECK:   %[[ADDI_20:.*]] = arith.addi %[[LB]], %[[MULI_19]] overflow<nsw>
+//        CHECK:   %[[ADDI_21:.*]] = arith.addi %[[MAXSI_18]], %[[C1]] overflow<nsw>
 //        CHECK:   %[[CMPI_22:.*]] = arith.cmpi sge, %[[DIVSI_15]], %[[C1]]
 //        CHECK:   %[[MULI_23:.*]] = arith.muli %[[STEP]], %[[ADDI_21]]
-//        CHECK:   %[[ADDI_24:.*]] = arith.addi %[[LB]], %[[MULI_23]]
+//        CHECK:   %[[ADDI_24:.*]] = arith.addi %[[LB]], %[[MULI_23]] overflow<nsw>
 //        CHECK:   %[[CMPI_25:.*]] = arith.cmpi sge, %[[DIVSI_15]], %[[C2]]
 //        CHECK:   scf.if %[[CMPI_22]] {
 //        CHECK:     memref.store %{{.*}}#0, %{{.*}}[%[[ADDI_20]]]
@@ -863,7 +863,7 @@ func.func @dynamic_loop(%A: memref<?xf32>, %result: memref<?xf32>, %lb: index, %
 //       NOEPILOGUE:       %[[CMPI_4:.*]] = arith.cmpi slt, %[[ARG5]], %[[SUBI_3]]
 //       NOEPILOGUE:       %[[ADDF_5:.*]] = arith.addf %[[ARG7]], %[[ARG6]]
 //       NOEPILOGUE:       %[[MULF_6:.*]] = arith.mulf %[[ADDF_5]], %{{.*}}
-//       NOEPILOGUE:       %[[ADDI_7:.*]] = arith.addi %[[ARG5]], %{{.*}}
+//       NOEPILOGUE:       %[[ADDI_7:.*]] = arith.addi %[[ARG5]], %{{.*}} overflow<nsw>
 //       NOEPILOGUE:       %[[IF_8:.*]] = scf.if %[[CMPI_4]]
 //       NOEPILOGUE:         %[[LOAD_9:.*]] = memref.load %{{.*}}[%[[ADDI_7]]]
 //       NOEPILOGUE:         scf.yield %[[LOAD_9]]
@@ -884,7 +884,7 @@ func.func @dynamic_loop(%A: memref<?xf32>, %result: memref<?xf32>, %lb: index, %
 //       CHECK:   %{{.*}}:2 = scf.for %[[ARG5:.*]] = %[[LB:.*]] to %[[UBM]] step %[[STEP:.*]] iter_args(%[[ARG6:.*]] = %{{.*}}, %[[ARG7:.*]] = %{{.*}})
 //       CHECK:       %[[ADDF_13:.*]] = arith.addf %[[ARG7]], %[[ARG6]]
 //       CHECK:       %[[MULF_14:.*]] = arith.mulf %[[ADDF_13]], %{{.*}}
-//       CHECK:       %[[ADDI_15:.*]] = arith.addi %[[ARG5]], %{{.*}}
+//       CHECK:       %[[ADDI_15:.*]] = arith.addi %[[ARG5]], %{{.*}} overflow<nsw>
 //       CHECK:       %[[LOAD_16:.*]] = memref.load %{{.*}}[%[[ADDI_15]]]
 //       CHECK:       scf.yield %[[MULF_14]], %[[LOAD_16]]
 //       CHECK:     }
@@ -940,7 +940,7 @@ func.func @dynamic_loop_result(%A: memref<?xf32>, %result: memref<?xf32>, %lb: i
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[ARG1]], %[[ARG0]] : f32
 //  CHECK-NEXT:     %[[MUL0:.*]] = arith.mulf %[[ADD0]], %[[CST0]] : f32
 //  CHECK-NEXT:     memref.store %[[MUL0]], %[[A]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L2:.*]] = memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[CST0]], %[[L2]] : f32
 //  CHECK-NEXT:   }

@llvmbot
Copy link
Member

llvmbot commented Dec 12, 2025

@llvm/pr-subscribers-mlir-scf

Author: Nirvedh Meshram (nirvedhmeshram)

Changes

Currently we can run into undefined behavior in case of signed wrap on index values when prefetching. This PR adda no signed wrap attribute to prevent that.


Full diff: https://github.com/llvm/llvm-project/pull/172066.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp (+12-6)
  • (modified) mlir/test/Dialect/SCF/loop-pipelining.mlir (+26-26)
diff --git a/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp b/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
index 7e7fba43fe9e0..6b87dcaf94b67 100644
--- a/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
@@ -283,7 +283,8 @@ LogicalResult LoopPipelinerInternal::emitPrologue(RewriterBase &rewriter) {
           arith::MulIOp::create(
               rewriter, loc, step,
               arith::ConstantOp::create(rewriter, loc,
-                                        rewriter.getIntegerAttr(t, i))));
+                                        rewriter.getIntegerAttr(t, i))),
+          arith::IntegerOverflowFlags::nsw);
       predicates[i] = arith::CmpIOp::create(rewriter, loc,
                                             arith::CmpIPredicate::slt, iv, ub);
     }
@@ -296,7 +297,8 @@ LogicalResult LoopPipelinerInternal::emitPrologue(RewriterBase &rewriter) {
         arith::MulIOp::create(
             rewriter, loc, step,
             arith::ConstantOp::create(rewriter, loc,
-                                      rewriter.getIntegerAttr(t, i))));
+                                      rewriter.getIntegerAttr(t, i))),
+        arith::IntegerOverflowFlags::nsw);
     setValueMapping(forOp.getInductionVar(), iv, i);
     for (Operation *op : opOrder) {
       if (stages[op] > i)
@@ -521,7 +523,8 @@ LogicalResult LoopPipelinerInternal::createKernel(
                 rewriter, forOp.getLoc(),
                 rewriter.getIntegerAttr(t, maxStage - stages[op])));
         Value iv = arith::AddIOp::create(rewriter, forOp.getLoc(),
-                                         newForOp.getInductionVar(), offset);
+                                         newForOp.getInductionVar(), offset,
+                                         arith::IntegerOverflowFlags::nsw);
         nestedNewOp->setOperand(operand->getOperandNumber(), iv);
         rewriter.setInsertionPointAfter(newOp);
         continue;
@@ -667,7 +670,8 @@ LoopPipelinerInternal::emitEpilogue(RewriterBase &rewriter,
                                            createConst(-1));
 
   Value rangeDiff = arith::SubIOp::create(rewriter, loc, ub, lb);
-  Value rangeIncrStep = arith::AddIOp::create(rewriter, loc, rangeDiff, step);
+  Value rangeIncrStep = arith::AddIOp::create(rewriter, loc, rangeDiff, step,
+                                              arith::IntegerOverflowFlags::nsw);
   Value rangeDecr =
       arith::AddIOp::create(rewriter, loc, rangeIncrStep, stepDecr);
   Value totalIterations =
@@ -686,12 +690,14 @@ LoopPipelinerInternal::emitEpilogue(RewriterBase &rewriter,
   for (int64_t i = 1; i <= maxStage; i++) {
     // newLastIter = lb + step * iterI
     Value newlastIter = arith::AddIOp::create(
-        rewriter, loc, lb, arith::MulIOp::create(rewriter, loc, step, iterI));
+        rewriter, loc, lb, arith::MulIOp::create(rewriter, loc, step, iterI),
+        arith::IntegerOverflowFlags::nsw);
 
     setValueMapping(forOp.getInductionVar(), newlastIter, i);
 
     // increment to next iterI
-    iterI = arith::AddIOp::create(rewriter, loc, iterI, one);
+    iterI = arith::AddIOp::create(rewriter, loc, iterI, one,
+                                  arith::IntegerOverflowFlags::nsw);
 
     if (dynamicLoop) {
       // Disable stages when `i` is greater than total_iters.
diff --git a/mlir/test/Dialect/SCF/loop-pipelining.mlir b/mlir/test/Dialect/SCF/loop-pipelining.mlir
index 86af637fc05d7..9828df0a0f3e8 100644
--- a/mlir/test/Dialect/SCF/loop-pipelining.mlir
+++ b/mlir/test/Dialect/SCF/loop-pipelining.mlir
@@ -14,7 +14,7 @@
 //  CHECK-SAME:     step %[[C1]] iter_args(%[[LARG:.*]] = %[[L0]]) -> (f32) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[LARG]], %{{.*}} : f32
 //  CHECK-NEXT:     memref.store %[[ADD0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[LR:.*]] = memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[LR]] : f32
 //  CHECK-NEXT:   }
@@ -87,7 +87,7 @@ func.func @iteration_lt_stage(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = scf.execute_region
 //  CHECK-NEXT:       arith.addf %[[LARG]], %{{.*}} : f32
 //       CHECK:     memref.store %[[ADD0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[LR:.*]] = scf.execute_region
 //  CHECK-NEXT:       memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //       CHECK:     scf.yield %[[LR]] : f32
@@ -135,7 +135,7 @@ func.func @simple_pipeline_region(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-SAME:     step %[[C3]] iter_args(%[[LARG0:.*]] = %[[L0]], %[[LARG1:.*]] = %[[L1]]) -> (f32, f32) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[LARG0]], %{{.*}} : f32
 //  CHECK-NEXT:     memref.store %[[ADD0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C6]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C6]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[LR:.*]] = memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[LARG1]], %[[LR]] : f32, f32
 //  CHECK-NEXT:   }
@@ -175,7 +175,7 @@ func.func @simple_pipeline_step(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-SAME:     %[[LARG:.*]] = %[[L1]]) -> (f32, f32) {
 //  CHECK-NEXT:     memref.store %[[ADDARG]], %[[R]][%[[IV]]] : memref<?xf32>
 //  CHECK-NEXT:     %[[ADD1:.*]] = arith.addf %[[LARG]], %{{.*}} : f32
-//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] : index
+//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L3:.*]] = memref.load %[[A]][%[[IV2]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[ADD1]], %[[L3]] : f32, f32
 //  CHECK-NEXT:   }
@@ -224,7 +224,7 @@ func.func @simple_pipeline_step(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  NOEPILOGUE-NEXT:     } else {
 //  NOEPILOGUE-NEXT:       scf.yield %[[CF]] : f32
 //  NOEPILOGUE-NEXT:     }
-//  NOEPILOGUE-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] : index
+//  NOEPILOGUE-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] overflow<nsw> : index
 //  NOEPILOGUE-NEXT:     %[[L3:.*]] = scf.if %[[S0]] -> (f32) {
 //  NOEPILOGUE-NEXT:       %[[PL:.*]] = memref.load %[[A]][%[[IV2]]] : memref<?xf32>
 //  NOEPILOGUE-NEXT:       scf.yield %[[PL]] : f32
@@ -274,7 +274,7 @@ func.func @three_stage(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-SAME:     %[[LA3:.*]] = %[[L3]]) -> (f32, f32, f32, f32) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[LA0]], %{{.*}} : f32
 //  CHECK-NEXT:     memref.store %[[ADD0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV4:.*]] = arith.addi %[[IV]], %[[C4]] : index
+//  CHECK-NEXT:     %[[IV4:.*]] = arith.addi %[[IV]], %[[C4]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L4:.*]] = memref.load %[[A]][%[[IV4]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[LA1]], %[[LA2]], %[[LA3]], %[[L4]] : f32, f32, f32, f32
 //  CHECK-NEXT:   }
@@ -326,7 +326,7 @@ func.func @long_liverange(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-NEXT:     %[[ADD2:.*]] = arith.addf %[[LA2]], %{{.*}} : f32
 //  CHECK-NEXT:     %[[MUL1:.*]] = arith.mulf %[[ADDARG1]], %[[LA1]] : f32
 //  CHECK-NEXT:     memref.store %[[MULARG0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C3]] : index
+//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C3]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L3:.*]] = memref.load %[[A]][%[[IV3]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[LA2]], %[[L3]], %[[ADD2]], %[[MUL1]] : f32, f32, f32, f32
 //  CHECK-NEXT:   }
@@ -379,7 +379,7 @@ func.func @multiple_uses(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-NEXT:     %[[MUL1:.*]] = scf.execute_region
 // arith.mulf %[[ADDARG1]], %[[LA1]] : f32
 //       CHECK:     memref.store %[[MULARG0]], %[[R]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C3]] : index
+//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C3]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L3:.*]] = memref.load %[[A]][%[[IV3]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[LA2]], %[[L3]], %[[ADD2]], %[[MUL1]] : f32, f32, f32, f32
 //  CHECK-NEXT:   }
@@ -425,7 +425,7 @@ func.func @region_multiple_uses(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-SAME:     step %[[C1]] iter_args(%[[C:.*]] = %[[CSTF]],
 //  CHECK-SAME:     %[[LARG:.*]] = %[[L0]]) -> (f32, f32) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[LARG]], %[[C]] : f32
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L1:.*]] = memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[ADD0]], %[[L1]] : f32, f32
 //  CHECK-NEXT:   }
@@ -464,7 +464,7 @@ func.func @loop_carried(%A: memref<?xf32>, %result: memref<?xf32>) {
 //  CHECK-SAME:     %[[ADDARG:.*]] = %[[ADD0]], %[[LARG:.*]] = %[[L1]]) -> (f32, f32, f32) {
 //  CHECK-NEXT:     %[[MUL0:.*]] = arith.mulf %[[ADDARG]], %[[CSTF]] : f32
 //  CHECK-NEXT:     %[[ADD1:.*]] = arith.addf %[[LARG]], %[[MUL0]] : f32
-//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] : index
+//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L2:.*]] = memref.load %[[A]][%[[IV2]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[MUL0]], %[[ADD1]], %[[L2]] : f32, f32, f32
 //  CHECK-NEXT:   }
@@ -509,7 +509,7 @@ func.func @backedge_different_stage(%A: memref<?xf32>) -> f32 {
 //       CHECK:     %[[MUL0:.*]] = arith.mulf %[[ADDARG]], %[[CSTF]] : f32
 //       CHECK:     %[[ADD1:.*]] = scf.execute_region
 //  CHECK-NEXT:       arith.addf %[[LARG]], %[[MUL0]] : f32
-//       CHECK:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] : index
+//       CHECK:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C2]] overflow<nsw> : index
 //       CHECK:     %[[L2:.*]] = scf.execute_region
 //  CHECK-NEXT:       memref.load %[[A]][%[[IV2]]] : memref<?xf32>
 //       CHECK:     scf.yield %[[MUL0]], %[[ADD1]], %[[L2]] : f32, f32, f32
@@ -558,7 +558,7 @@ func.func @region_backedge_different_stage(%A: memref<?xf32>) -> f32 {
 //  CHECK-SAME:     %[[LARG:.*]] = %[[L0]]) -> (f32, f32) {
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[LARG]], %[[C]] : f32
 //  CHECK-NEXT:     %[[MUL0:.*]] = arith.mulf %[[ADD0]], %[[CSTF]] : f32
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L2:.*]] = memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[MUL0]], %[[L2]] : f32, f32
 //  CHECK-NEXT:   }
@@ -603,11 +603,11 @@ func.func @backedge_same_stage(%A: memref<?xf32>) -> f32 {
 // CHECK:     %[[CV:.+]] = memref.subview %[[ARG2]]
 // CHECK:     linalg.generic
 // CHECK-SAME:  ins(%[[IA]], %[[IB]], %{{.*}} : {{.*}}) outs(%[[CV]] :
-// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]]
+// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]] overflow<nsw>
 // CHECK:     %[[ASV:.+]] = memref.subview %[[ARG0]][%[[NEXT]]] [8] [1] :
-// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]] :
+// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> :
 // CHECK:     %[[BSV:.+]] = memref.subview %[[ARG1]][%[[NEXT]]] [8] [1] :
-// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]] :
+// CHECK:     %[[NEXT:.+]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> :
 // CHECK:     %[[BUFIDX:.+]] = affine.apply
 // CHECK:     %[[APROSV:.+]] = memref.subview %[[APRO]][%[[BUFIDX]], 0] [1, 8] [1, 1] :
 // CHECK:     %[[BPROSV:.+]] = memref.subview %[[BPRO]][%[[BUFIDX]], 0] [1, 8] [1, 1] :
@@ -679,10 +679,10 @@ func.func @pipeline_op_with_region(%A: memref<?xf32>, %B: memref<?xf32>, %result
 //  CHECK-NEXT:   %[[R:.*]]:3 = scf.for %[[IV:.*]] = %[[C0]] to %[[C3]]
 //  CHECK-SAME:     step %[[C1]] iter_args(%[[C:.*]] = %[[CSTF]],
 //  CHECK-SAME:     %[[ARG1:.*]] = %[[L0]], %[[ARG2:.*]] = %[[L1]]) -> (f32, f32, f32) {
-//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV2:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L2:.*]] = memref.load %[[A]][%[[IV2]]] : memref<?xf32>
 //  CHECK-NEXT:     %[[MUL0:.*]] = arith.mulf %[[C]], %[[ARG1]] : f32
-//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV3:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[IV4:.*]] = arith.addi %[[IV3]], %[[C1]] : index
 //  CHECK-NEXT:     %[[L3:.*]] = memref.load %[[A]][%[[IV4]]] : memref<?xf32>
 //  CHECK-NEXT:     %[[MUL1:.*]] = arith.mulf %[[ARG2]], %[[MUL0]] : f32
@@ -778,9 +778,9 @@ func.func @stage_0_value_escape(%A: memref<?xf32>, %result: memref<?xf32>, %ub:
 //      NOEPILOGUE: %[[P_I0:.+]] = arith.cmpi slt, %[[LB]], %[[UB]] : index
 //      NOEPILOGUE: %[[L0:.+]] = scf.if %[[P_I0]] -> (f32) {
 // NOEPILOGUE-NEXT:   memref.load %[[A]][%[[LB]]] : memref<?xf32>
-//      NOEPILOGUE: %[[IV1:.+]] = arith.addi %[[LB]], %[[STEP]] : index
+//      NOEPILOGUE: %[[IV1:.+]] = arith.addi %[[LB]], %[[STEP]] overflow<nsw> : index
 //      NOEPILOGUE: %[[P_I1:.+]] = arith.cmpi slt, %[[IV1]], %[[UB]] : index
-//      NOEPILOGUE: %[[IV1_2:.+]] = arith.addi %[[LB]], %[[STEP]] : index
+//      NOEPILOGUE: %[[IV1_2:.+]] = arith.addi %[[LB]], %[[STEP]] overflow<nsw> : index
 //      NOEPILOGUE: %[[V0:.+]] = scf.if %[[P_I0]] -> (f32) {
 // NOEPILOGUE-NEXT:   arith.addf %[[L0]], %[[CSTF]] : f32
 //      NOEPILOGUE: %[[L1:.+]] = scf.if %[[P_I1]] -> (f32) {
@@ -795,7 +795,7 @@ func.func @stage_0_value_escape(%A: memref<?xf32>, %result: memref<?xf32>, %ub:
 //      NOEPILOGUE:   %[[V2:.+]] = scf.if %[[P_I3]] -> (f32) {
 //      NOEPILOGUE:     arith.addf %[[L2]], %[[CSTF]] : f32
 //      NOEPILOGUE:   %[[IT4:.+]] = arith.muli %[[STEP]], %[[C2]] : index
-//      NOEPILOGUE:   %[[IV3:.+]] = arith.addi %[[IV2]], %[[IT4]] : index
+//      NOEPILOGUE:   %[[IV3:.+]] = arith.addi %[[IV2]], %[[IT4]] overflow<nsw> : index
 //      NOEPILOGUE:   %[[L3:.+]] = scf.if %[[P_I2]] -> (f32) {
 //      NOEPILOGUE:     memref.load %[[A]][%[[IV3]]] : memref<?xf32>
 //      NOEPILOGUE:   scf.yield %[[V2]], %[[L3]] : f32, f32
@@ -824,11 +824,11 @@ func.func @stage_0_value_escape(%A: memref<?xf32>, %result: memref<?xf32>, %ub:
 //        CHECK:   %[[SUBI_17:.*]] = arith.subi %[[DIVSI_15]], %[[C2]]
 //        CHECK:   %[[MAXSI_18:.*]] = arith.maxsi %[[SUBI_17]], %[[C0]]
 //        CHECK:   %[[MULI_19:.*]] = arith.muli %[[STEP]], %[[MAXSI_18]]
-//        CHECK:   %[[ADDI_20:.*]] = arith.addi %[[LB]], %[[MULI_19]]
-//        CHECK:   %[[ADDI_21:.*]] = arith.addi %[[MAXSI_18]], %[[C1]]
+//        CHECK:   %[[ADDI_20:.*]] = arith.addi %[[LB]], %[[MULI_19]] overflow<nsw>
+//        CHECK:   %[[ADDI_21:.*]] = arith.addi %[[MAXSI_18]], %[[C1]] overflow<nsw>
 //        CHECK:   %[[CMPI_22:.*]] = arith.cmpi sge, %[[DIVSI_15]], %[[C1]]
 //        CHECK:   %[[MULI_23:.*]] = arith.muli %[[STEP]], %[[ADDI_21]]
-//        CHECK:   %[[ADDI_24:.*]] = arith.addi %[[LB]], %[[MULI_23]]
+//        CHECK:   %[[ADDI_24:.*]] = arith.addi %[[LB]], %[[MULI_23]] overflow<nsw>
 //        CHECK:   %[[CMPI_25:.*]] = arith.cmpi sge, %[[DIVSI_15]], %[[C2]]
 //        CHECK:   scf.if %[[CMPI_22]] {
 //        CHECK:     memref.store %{{.*}}#0, %{{.*}}[%[[ADDI_20]]]
@@ -863,7 +863,7 @@ func.func @dynamic_loop(%A: memref<?xf32>, %result: memref<?xf32>, %lb: index, %
 //       NOEPILOGUE:       %[[CMPI_4:.*]] = arith.cmpi slt, %[[ARG5]], %[[SUBI_3]]
 //       NOEPILOGUE:       %[[ADDF_5:.*]] = arith.addf %[[ARG7]], %[[ARG6]]
 //       NOEPILOGUE:       %[[MULF_6:.*]] = arith.mulf %[[ADDF_5]], %{{.*}}
-//       NOEPILOGUE:       %[[ADDI_7:.*]] = arith.addi %[[ARG5]], %{{.*}}
+//       NOEPILOGUE:       %[[ADDI_7:.*]] = arith.addi %[[ARG5]], %{{.*}} overflow<nsw>
 //       NOEPILOGUE:       %[[IF_8:.*]] = scf.if %[[CMPI_4]]
 //       NOEPILOGUE:         %[[LOAD_9:.*]] = memref.load %{{.*}}[%[[ADDI_7]]]
 //       NOEPILOGUE:         scf.yield %[[LOAD_9]]
@@ -884,7 +884,7 @@ func.func @dynamic_loop(%A: memref<?xf32>, %result: memref<?xf32>, %lb: index, %
 //       CHECK:   %{{.*}}:2 = scf.for %[[ARG5:.*]] = %[[LB:.*]] to %[[UBM]] step %[[STEP:.*]] iter_args(%[[ARG6:.*]] = %{{.*}}, %[[ARG7:.*]] = %{{.*}})
 //       CHECK:       %[[ADDF_13:.*]] = arith.addf %[[ARG7]], %[[ARG6]]
 //       CHECK:       %[[MULF_14:.*]] = arith.mulf %[[ADDF_13]], %{{.*}}
-//       CHECK:       %[[ADDI_15:.*]] = arith.addi %[[ARG5]], %{{.*}}
+//       CHECK:       %[[ADDI_15:.*]] = arith.addi %[[ARG5]], %{{.*}} overflow<nsw>
 //       CHECK:       %[[LOAD_16:.*]] = memref.load %{{.*}}[%[[ADDI_15]]]
 //       CHECK:       scf.yield %[[MULF_14]], %[[LOAD_16]]
 //       CHECK:     }
@@ -940,7 +940,7 @@ func.func @dynamic_loop_result(%A: memref<?xf32>, %result: memref<?xf32>, %lb: i
 //  CHECK-NEXT:     %[[ADD0:.*]] = arith.addf %[[ARG1]], %[[ARG0]] : f32
 //  CHECK-NEXT:     %[[MUL0:.*]] = arith.mulf %[[ADD0]], %[[CST0]] : f32
 //  CHECK-NEXT:     memref.store %[[MUL0]], %[[A]][%[[IV]]] : memref<?xf32>
-//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] : index
+//  CHECK-NEXT:     %[[IV1:.*]] = arith.addi %[[IV]], %[[C1]] overflow<nsw> : index
 //  CHECK-NEXT:     %[[L2:.*]] = memref.load %[[A]][%[[IV1]]] : memref<?xf32>
 //  CHECK-NEXT:     scf.yield %[[CST0]], %[[L2]] : f32
 //  CHECK-NEXT:   }

Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com>
@nirvedhmeshram
Copy link
Contributor Author

The issues with overflow were observed in a downstream project iree-org/iree#22887

Copy link
Contributor

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a big gun, could you point out which math operation overflows? I expect most of them cannot overflow (assume the original induction variable doesn't overflow)

Comment on lines +280 to +287
// pred = ub > lb + (i * step)
Value iv = arith::AddIOp::create(
rewriter, loc, lb,
arith::MulIOp::create(
rewriter, loc, step,
arith::ConstantOp::create(rewriter, loc,
rewriter.getIntegerAttr(t, i))));
rewriter.getIntegerAttr(t, i))),
arith::IntegerOverflowFlags::nsw);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can lb + i * step overflow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can lb + i * step overflow?

Why cant this overflow? I might be missing something here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it can be that would mean the original loop induction variable also overflows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly. I dont know how to phrase the question yet, but I think we can add this nsw trait if the original loop as some way of saying lb + i * step does not overflow. Still formalizing the whole thing so I am not ready to raise a broader question here yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I read too fast but as @krzysz00 pointed nsw is an optimization and shouldn't fix functional problems so something else is off

Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding nsw to a computation that overflows actually makes it more undefined. nsw is the assertion that the addition in question will not overflow on pain of nasal demons.

If you don't put in an nsw, you're signing up for the overflow semantics of your target, which in LLVM, is the usual 2's complement wrap-around.

Please provide more context.

(If the intent of this change is to exploit a property of scf.for, could you remind me where it's documented that overflow isn't allowed for loop IVs?)

@nirvedhmeshram
Copy link
Contributor Author

Fair points I will further study the down stream issue as to why we need this. The case in which the error is happening doesn't seem like it would overflow but it causes a numeric error without this flag anyway. My only guess was somehow this behavior gets propagated.

@krzysz00 krzysz00 self-requested a review December 12, 2025 19:48
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so, to restate the block

This change is apparently one possible workaround for a yet-unknown bug.

We shouldn't be landing that sort of thing.

Once the bug is identified, we can, in the future, discuss whether or not scf.for has a nsw loop iterator - that is, if we've got C's signed-overflow-is-UB semantics here - and apply this patch with a title like "Use the fact that scf.for can't overflow to improve its lowerings"

@nirvedhmeshram
Copy link
Contributor Author

Closing this as inferred by others the problem is elsewhere in the stack, the NSW flag was just disabling a possible CSE of the add ops, the CSE is causing (yet to be debuged) issue with control flow basic blocks taking in extra arguments which is then leading to numeric issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants