Skip to content

Fix rawBufferVectorLoad/Store to widen min precision types to 32-bit#8274

Draft
alsepkow wants to merge 1 commit intomicrosoft:mainfrom
alsepkow:user/alsepkow/fix-min-precision-vector-load
Draft

Fix rawBufferVectorLoad/Store to widen min precision types to 32-bit#8274
alsepkow wants to merge 1 commit intomicrosoft:mainfrom
alsepkow:user/alsepkow/fix-min-precision-vector-load

Conversation

@alsepkow
Copy link
Contributor

Summary

Fixes rawBufferVectorLoad/Store to use 32-bit element types (i32/f32) for min precision types (min16int, min16uint, min16float) instead of 16-bit (i16/f16). This matches how pre-SM6.9 RawBufferLoad handles min precision.

Resolves #8273

Root Cause

\TranslateBufLoad\ in \HLOperationLower.cpp\ creates the vector type directly from the min precision element type (i16/f16) without widening to i32/f32. This causes WARP (and potentially other drivers) to load/store 2 bytes per element instead of 4, mismatching the buffer layout.

Fix

Apply the same widening pattern used for bool types:

  • Load: Widen element type to i32/f32, load via
    awBufferVectorLoad.v_i32/v_f32, then \ runc/\ ptrunc\ back
  • Store: \sext/\ pext\ to i32/f32, then store via
    awBufferVectorStore.v_i32/v_f32\

Testing

Added FileCheck test verifying all 3 min precision types produce i32/f32 vector load/store ops.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

RawBufferVectorLoad/Store for min precision types (min16int, min16uint,
min16float) was emitting i16/f16 vector operations (e.g., v3i16) which
causes WARP and potentially other drivers to load/store 2 bytes per
element instead of 4. This mismatches the buffer layout when the CPU
writes 32-bit values.

Pre-SM6.9 RawBufferLoad correctly handles this by loading as i32/f32
and truncating. Apply the same pattern for SM6.9 vector variants:
- RawBufferVectorLoad: load as v_i32/v_f32, then trunc to i16/half
- RawBufferVectorStore: sext/fpext to i32/f32, then store as v_i32/v_f32

This matches the existing bool widening pattern already in TranslateBufLoad.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New

Development

Successfully merging this pull request may close these issues.

rawBufferVectorLoad/Store emits i16/f16 for min precision types instead of i32/f32

1 participant