From 5699273e5189d9fc8be92d5c584e8fa86e3a2b54 Mon Sep 17 00:00:00 2001 From: Daily Perf Improver Date: Sun, 12 Oct 2025 15:22:08 +0000 Subject: [PATCH] Optimize dot product horizontal reduction with Vector.Sum MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace manual accumulation loop with Vector.Sum() for SIMD horizontal reduction. Vector.Sum() uses hardware-specific horizontal add instructions (e.g., VPHADDPS on AVX) which are more efficient than manual element-by-element accumulation. Performance improvements (ShortRun benchmarks): - Size 10: 35.8% faster (6.894 → 4.426 ns, 1.56× speedup) - Size 100: 8.3% faster (27.745 → 25.434 ns, 1.09× speedup) - Size 1000: ~equivalent (238.856 → 241.945 ns) - Size 10000: ~equivalent (2,359 → 2,355 ns) All 488 tests pass. No allocations changed. --- src/FsMath/SpanMath.fs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/FsMath/SpanMath.fs b/src/FsMath/SpanMath.fs index 6c8c26e..312061f 100644 --- a/src/FsMath/SpanMath.fs +++ b/src/FsMath/SpanMath.fs @@ -36,10 +36,10 @@ type SpanMath = let vy = Numerics.Vector<'T>(y.Slice(yi, simdWidth)) accVec <- accVec + (vx * vy) - let mutable acc = LanguagePrimitives.GenericZero<'T> - for i = 0 to simdWidth - 1 do - acc <- acc + accVec.[i] + // Use Vector.Sum for optimized horizontal reduction (uses hardware-specific instructions) + let mutable acc = Numerics.Vector.Sum(accVec) + // Handle remaining elements (tail) for i = ceiling to length - 1 do acc <- acc + x.[xOffset + i] * y.[yOffset + i]