Skip to content

Execution Tests: Add min precision test cases to the long vector test#8260

Open
alsepkow wants to merge 32 commits intomicrosoft:mainfrom
alsepkow:user/alsepkow/MinPrecision
Open

Execution Tests: Add min precision test cases to the long vector test#8260
alsepkow wants to merge 32 commits intomicrosoft:mainfrom
alsepkow:user/alsepkow/MinPrecision

Conversation

@alsepkow
Copy link
Contributor

@alsepkow alsepkow commented Mar 11, 2026

This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available — D3D12_SHADER_MIN_PRECISION_SUPPORT only reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existing DxilConf_SM69_Vectorized_Core class alongside other types.

Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage.

Key design decisions

Full-precision buffer I/O: Min precision types have implementation-defined buffer storage width, so we use full-precision types (float/int/uint) for all Load/Store operations via the IO_TYPE/IO_OUT_TYPE shader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation.

Half-precision tolerances: Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less.

Test coverage mirrors existing patterns:

  • min16float mirrors HLSLHalf_t (float/trig/math/comparison/dot/cast/derivative/wave/quad/load-store)
  • min16int mirrors int16_t (arithmetic/bitwise/comparison/reduction/cast/wave/quad/load-store)
  • min16uint mirrors uint16_t (arithmetic/bitwise/comparison/cast/wave/quad/load-store)

Wave and quad op support: Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use #ifdef MIN_PRECISION guards to store results via IO_OUT_TYPE for deterministic buffer layout without changing DXIL for existing non-min-precision tests.

Excluded operations:

  • Signed div/mod on min16int: HLSL does not support signed integer division on min precision types
  • Bit shifting on min16int/min16uint: Not supported for min precision types
  • FP specials (INF/NaN/denorm): min precision types do not support them

Resolves #7780

The array accessor and wave/quad op tests for min precision require the optimizer fix from: #8269

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

alsepkow and others added 7 commits March 10, 2026 17:20
Add device support helper, wrapper types, input data sets, type
registration, and validation for min16float, min16int, min16uint.

- doesDeviceSupportMinPrecision() checks D3D12 MinPrecisionSupport
- HLSLMin16Float_t/HLSLMin16Int_t/HLSLMin16Uint_t wrapper structs
  (32-bit storage, matching DXIL layout without -enable-16bit-types)
- Input data constrained to 16-bit representable range
- DATA_TYPE registrations and isFloatingPointType/isMinPrecisionType traits
- doValuesMatch overloads: min16float compares in half-precision space
  (reuses CompareHalfULP/CompareHalfEpsilon), integers use exact match
- TrigonometricValidation specializations matching HLSLHalf_t tolerances

Part of: microsoft#7780

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add DxilConf_SM69_Vectorized_MinPrecision test class with HLK_TEST_MINP
and HLK_WAVEOP_TEST_MINP macros. Mirrors 16-bit counterpart coverage
(HLSLHalf_t/int16_t/uint16_t) minus documented exclusions.

- New test class with Kits.Specification =
  Device.Graphics.D3D12.DXILCore.ShaderModel69.MinPrecision
- setupClass skips when device lacks min precision support
- ~160 test entries across 3 types (min16float/min16int/min16uint)
- MakeDifferent overloads in ShaderOpArith.xml (not gated by
  __HLSL_ENABLE_16_BIT since min precision is always available)
- Excluded: FP specials, AsType, Cast, bit-manipulation ops

Part of: microsoft#7780

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make all but one conversion operator explicit per wrapper type to
avoid C2666 ambiguity with built-in arithmetic operators. Matches
HLSLHalf_t pattern: one implicit conversion to the natural type
(float/int32_t/uint32_t), all others explicit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The implicit operator float() combined with implicit constructors from
int/uint32_t created ambiguity for expressions like 'A + 4': the
compiler could not choose between member operator+(HLSLMin16Float_t)
via constructor and built-in float+int via conversion. Making the
int/uint constructors explicit eliminates the member operator+ path
for int literals while preserving T(0) direct construction and
implicit float conversion for std:: math functions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
HLSLMin16Int_t: add uint32_t and uint64_t constructors for
static_cast<T>(UINT) and static_cast<T>(size_t) patterns used in
shift masking and wave ops. Add operator~() for bitwise NOT in
WaveMultiPrefixBit ops.

HLSLMin16Uint_t: add operator~() for the same reason.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UnaryMathAbs: extend unsigned check to include HLSLMin16Uint_t
(std::is_unsigned_v is false for class types, so abs was called
with ambiguous overloads via implicit operator uint32_t).

MaskShiftAmount: change constexpr to const since wrapper types are
not literal types (no constexpr constructors).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
alsepkow and others added 2 commits March 10, 2026 18:09
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ~27 Cast test entries (CastToBool, CastToInt16, CastToInt32,
CastToInt64, CastToUint16/32/64, CastToFloat16/32) for all three
min precision types. The generic Cast templates work via the single
implicit conversion operator on each wrapper type — C-style casts
chain through it (e.g. (int32_t)min16float goes float->int32_t).

Remove explicit conversion operators (operator double, operator
int32_t, etc.) that were not exercised since Cast tests were not
previously included and no other code paths use them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@alsepkow alsepkow changed the title Execution Tetsts: Add min precision test cases to the long vector test Execution Tests: Add min precision test cases to the long vector test Mar 11, 2026
alsepkow and others added 8 commits March 11, 2026 15:56
WaveMultiPrefixBitAnd/BitOr/BitXor use the any_int type set (g_AnyIntCT)
which is defined as {int16, int32, int64, uint16, uint32, uint64} and does
not include min precision integer types (min16int, min16uint). Remove the 6
invalid test entries and the now-unused operator~() from both integer
wrapper types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move all min precision test entries (min16float, min16int, min16uint)
from the separate DxilConf_SM69_Vectorized_MinPrecision class into
DxilConf_SM69_Vectorized_Core, using HLK_TEST/HLK_WAVEOP_TEST macros.

Remove the HLK_TEST_MINP and HLK_WAVEOP_TEST_MINP macro definitions,
the DxilConf_SM69_Vectorized_MinPrecision class, and the
doesDeviceSupportMinPrecision utility function since min precision
support checking is not required.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Min precision types (min16float, min16int, min16uint) are hints that
allow hardware to use any precision >= the specified minimum, making
buffer storage width implementation-defined. Add IO_TYPE/IO_OUT_TYPE
compiler defines that map min precision types to their full-precision
equivalents (float, int, uint) for buffer Load/Store operations. For
all other types, IO_TYPE equals TYPE and IO_OUT_TYPE equals OUT_TYPE.

This ensures deterministic buffer data layout regardless of the
device's min precision implementation, while still testing min
precision computation via explicit casts between the I/O types and
the min precision types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
HLSL does not support signed integer division on minimum-precision
types. The compiler rejects these with: 'signed integer division is
not supported on minimum-precision types, cast to int to use 32-bit
division'. Remove the Divide and Modulus test entries for
HLSLMin16Int_t.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace min16float input values that are not exactly representable
in float16 with values that are. This avoids precision mismatches
between CPU-side expected value computation (float32) and GPU-side
min precision results, where the cast to min16float rounds values
to the nearest float16 representation.

Key changes:
- Default1: -0.01f -> -0.03125f (exact power-of-2 fraction)
- Positive: 0.01f -> 0.03125f, 5531.0f -> 5504.0f,
  331.233f -> 331.25f, 3250.01f -> 3250.0f
- RangeHalfPi/RangeOne: replaced with float16-exact fractions
  covering the same ranges

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adjust input values so that arithmetic results (multiply, mad,
subtract, left-shift, wave prefix products) do not overflow 16-bit
integer range. Min precision types compute at >= 16 bits, so results
that overflow at 16 bits differ from the 32-bit expected values.

min16uint changes:
- Default1: reduced large values (699->199, 1023->200) so products
  and wave prefix products fit in uint16
- Default1: ensured all values >= Default2 to avoid subtract underflow
  (1->3, 6->10, 0->22)
- BitShiftRhs: reduced large shifts (13->12, 14->12, 15->12) so
  shifted values fit in uint16

min16int changes:
- BitShiftRhs: reduced large shifts (13->11, 14->11, 15->14) so
  shifted values fit in int16

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wave and quad intrinsics (WaveReadLaneAt, WaveReadLaneFirst,
WaveActiveSum/Min/Max/Product/AllEqual, WavePrefixSum/Product,
WaveMultiPrefixSum/Product, WaveMatch, QuadReadLaneAt,
QuadReadAcrossX/Y/Diagonal) do not support min precision types
(min16float, min16int, min16uint). The DXIL wave/quad shuffle
operations operate on 32-bit or 64-bit register slots and do not
handle 16-bit min precision payloads.

Removes 48 test entries (16 per min precision type) and adds
explanatory comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The dot product tolerance computation was using float32 ULPs for
HLSLMin16Float_t, but the GPU may compute at float16 precision.
With NUM=256 elements the accumulated error exceeds the float32-based
epsilon. Use HLSLHalf_t::GetULP to compute half-precision ULPs for
min16float, matching the approach already used for HLSLHalf_t.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 14, 2026

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff f62b9b4be05f76adb41134e3abd9323be71e68f4 cb6175aa8b8258ce062fed1677f474b9f9e02a50 -- tools/clang/unittests/HLSLExec/LongVectorTestData.h tools/clang/unittests/HLSLExec/LongVectors.cpp
View the diff from clang-format here.
diff --git a/tools/clang/unittests/HLSLExec/LongVectors.cpp b/tools/clang/unittests/HLSLExec/LongVectors.cpp
index 28fba8d0..ed46e4b3 100644
--- a/tools/clang/unittests/HLSLExec/LongVectors.cpp
+++ b/tools/clang/unittests/HLSLExec/LongVectors.cpp
@@ -1889,7 +1889,8 @@ void dispatchMinPrecisionTest(ID3D12Device *D3DDevice, bool VerboseLogging,
   Op<OP, T, Operation.Arity> Op;
 
   // Min precision buffer storage width is implementation-defined, so we use
-  // full-precision types for Load/Store via BUFFER_TYPE/BUFFER_OUT_TYPE defines.
+  // full-precision types for Load/Store via BUFFER_TYPE/BUFFER_OUT_TYPE
+  // defines.
   for (size_t VectorSize : InputVectorSizes) {
     std::vector<std::vector<T>> Inputs =
         buildTestInputs<T>(VectorSize, Operation.InputSets, Operation.Arity);
@@ -1920,13 +1921,13 @@ void dispatchMinPrecisionWaveOpTest(ID3D12Device *D3DDevice,
   Op<OP, T, Operation.Arity> Op;
 
   // Min precision buffer storage width is implementation-defined, so we use
-  // full-precision types for Load/Store via BUFFER_TYPE/BUFFER_OUT_TYPE defines.
+  // full-precision types for Load/Store via BUFFER_TYPE/BUFFER_OUT_TYPE
+  // defines.
   for (size_t VectorSize : InputVectorSizes) {
     std::vector<std::vector<T>> Inputs =
         buildTestInputs<T>(VectorSize, Operation.InputSets, Operation.Arity);
 
-    auto Expected =
-        ExpectedBuilder<OP, T>::buildExpected(Op, Inputs, WaveSize);
+    auto Expected = ExpectedBuilder<OP, T>::buildExpected(Op, Inputs, WaveSize);
 
     using OutT = typename decltype(Expected)::value_type;
 
@@ -3020,7 +3021,6 @@ public:
   HLK_MIN_PRECISION_TEST(Or, HLSLMin16Int_t);
   HLK_MIN_PRECISION_TEST(Xor, HLSLMin16Int_t);
 
-
   // UnaryMath
   HLK_MIN_PRECISION_TEST(Abs, HLSLMin16Int_t);
   HLK_MIN_PRECISION_TEST(Sign, HLSLMin16Int_t);
@@ -3116,7 +3116,6 @@ public:
   HLK_MIN_PRECISION_TEST(Or, HLSLMin16Uint_t);
   HLK_MIN_PRECISION_TEST(Xor, HLSLMin16Uint_t);
 
-
   // UnaryMath
   HLK_MIN_PRECISION_TEST(Abs, HLSLMin16Uint_t);
   HLK_MIN_PRECISION_TEST(Sign, HLSLMin16Uint_t);
  • Check this box to apply formatting changes to this branch.

@alsepkow alsepkow force-pushed the user/alsepkow/MinPrecision branch from c6a3918 to d8cfc9e Compare March 16, 2026 20:49
alsepkow and others added 2 commits March 16, 2026 16:46
Cast operations have different input and output types (e.g. min16float
input with int32 output), so each side needs its own IO type mapping.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Enables constexpr on MaskShiftAmount's ShiftMask local variable,
restoring the original constexpr qualifier that was downgraded to const.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@alsepkow
Copy link
Contributor Author

Discussion: Shift operator semantics for min precision integer types

While working on these tests, I investigated how << and >> work on min16int/min16uint and found a potential semantic ambiguity worth discussing with the team.

How shifts are implemented

Shift operators aren't intrinsics — they're built-in binary operators handled through Clang's Sema/CodeGen layers:

  • Sema validation: SemaHLSL.cpp classifies shifts as "bitwise" ops requiring integral types (BinaryOperatorKindIsBitwiseBinaryOperatorKindRequiresIntegralsIsBasicKindIntegral). min16int and min16uint have BPROP_INTEGER set, so they pass validation. min16float is correctly rejected.

  • Shift amount wrapping: HLSL wraps shift amounts like OpenCL — the RHS is masked by (bit_width - 1). This happens in CGExprScalar.cpp:EmitShl/EmitShr:

    if (CGF.getLangOpts().OpenCL || CGF.getLangOpts().HLSL)
        RHS = Builder.CreateAnd(RHS, GetWidthMinusOneValue(Ops.LHS, RHS), "shl.mask");

The ambiguity

min16int has AST Width = 16 (ASTContext.cpp:1683), producing LLVM IR type i16 (CodeGenTypes.cpp:437). So GetWidthMinusOneValue returns 15, and the shift amount is masked with & 15 (mod 16).

However, in min precision mode (UseMinPrecision=true), the hardware executes on 32-bit registers. The & 15 mask is baked into the IR before any min-precision-to-32-bit promotion. This means:

  • The shift amount is clamped to [0, 15] based on the declared 16-bit type
  • But the actual shift executes on a 32-bit value

For left shift, this is arguably fine — bits above 15 are "extra precision." For right shift, it's murkier — if the 32-bit register has bits set above bit 15 from prior operations, a right shift masked to [0, 15] won't clear them the way a true 16-bit right shift would.

Compare to int32_t where the mask is & 31 and execution width matches. For min precision, the wrapping width (16) doesn't necessarily match the execution width (potentially 32).

Test data is safe

The BitShiftRhs input set for int16_t/uint16_t (reused by Min16Int_t/Min16UInt_t) uses values {1, 6, 3, 0, 9, 3, 12, 13, 14, 15} — all within [0, 15], so we avoid the ambiguous boundary. This was intentional.

Existing test coverage

  • Sema tests (scalar-operators.hlsl): verify type rules for min16 shifts ✅
  • FileCheck tests for shift IR output on min precision: none ❌
  • Execution tests for min precision shifts: this PR adds the first ones ✅

Questions for the team

  1. Is the & 15 mask correct for min precision types, or should it be & 31 to match the actual execution width?
  2. Should we add a boundary-probing test (e.g., shift by 16) to document/pin down the current behavior?
  3. Is this a known design decision or a gap that should be tracked as a separate issue?

alsepkow and others added 7 commits March 16, 2026 17:38
These input sets are not referenced by any test entry since wave ops
are excluded for min precision types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Only define -DMIN_PRECISION, -DIO_TYPE, and -DIO_OUT_TYPE for min
precision test types. Shader templates use #ifdef MIN_PRECISION to
gate the load-with-cast paths, leaving non-min-precision shaders
completely unchanged. Fallback #ifndef defines ensure IO_OUT_TYPE
resolves to OUT_TYPE for Store calls when MIN_PRECISION is not set.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wave, quad, splitdouble, frexp, and modf stores will never be reached
by min precision test types, so IO_OUT_TYPE is unnecessary. Keep it
only for the final main store and derivative stores which are exercised
by min precision tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move min precision IO_TYPE/IO_OUT_TYPE/MIN_PRECISION defines from
getCompilerOptionsString into a dedicated dispatchMinPrecisionTest
function that passes them via AdditionalCompilerOptions. This matches
the existing pattern used by dispatchWaveOpTest and keeps the shared
compiler options builder clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use if constexpr on the C++ type instead of strcmp on HLSL type
strings. Cleaner and resolved at compile time.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add IOTypeString field to DataType and MIN_PRECISION_DATA_TYPE macro.
Remove standalone getIOTypeString template function. The IO type
mapping now lives alongside the other type metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@alsepkow alsepkow marked this pull request as ready for review March 17, 2026 01:21
alsepkow and others added 5 commits March 17, 2026 13:43
Bit shifting is not supported for min precision data types (min16int,
min16uint). Remove LeftShift/RightShift test entries and their associated
BitShiftRhs input data sets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add wave and quad op test entries for min16float, min16int, and min16uint
long vector types, mirroring the ops supported by their 16-bit equivalents.

Test entries added:
- Min16Float: 12 wave ops + 4 quad ops
- Min16Int: 15 wave ops (includes WaveMultiPrefixBit*) + 4 quad ops
- Min16Uint: 15 wave ops (includes WaveMultiPrefixBit*) + 4 quad ops

Infrastructure changes:
- ShaderOpArith.xml: Add #ifdef MIN_PRECISION guards to 7 wave op Store
  calls so they use IO_OUT_TYPE for min precision buffer I/O while keeping
  OUT_TYPE for standard types (no DXIL change for existing tests)
- LongVectors.cpp: Add dispatchMinPrecisionWaveOpTest combining WaveSize
  with IO_TYPE/MIN_PRECISION compiler options, and HLK_MIN_PRECISION_WAVEOP_TEST
  macro
- LongVectorTestData.h: Add WaveMultiPrefixBitwise input sets for
  HLSLMin16Int_t and HLSLMin16Uint_t
- Fix ambiguous operator& in waveMultiPrefixBitOr for min precision types

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename the shader defines and C++ field to better communicate their purpose:
these specify the full-precision buffer storage type for Load/Store
operations, not a generic I/O type.

- IO_TYPE -> BUFFER_TYPE
- IO_OUT_TYPE -> BUFFER_OUT_TYPE
- IOTypeString -> BufferTypeString

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract duplicated WaveSize computation from runWaveOpTest and
runMinPrecisionWaveOpTest into a shared getWaveSize() method.

Update section comments to say 'mirrors applicable ops' since not all
ops from the 16-bit equivalent types are supported for min precision.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New

Development

Successfully merging this pull request may close these issues.

Long Vector Execution Tests: Add test cases using min precision values

1 participant