Execution Tests: Add min precision test cases to the long vector test#8260
Execution Tests: Add min precision test cases to the long vector test#8260alsepkow wants to merge 32 commits intomicrosoft:mainfrom
Conversation
Add device support helper, wrapper types, input data sets, type registration, and validation for min16float, min16int, min16uint. - doesDeviceSupportMinPrecision() checks D3D12 MinPrecisionSupport - HLSLMin16Float_t/HLSLMin16Int_t/HLSLMin16Uint_t wrapper structs (32-bit storage, matching DXIL layout without -enable-16bit-types) - Input data constrained to 16-bit representable range - DATA_TYPE registrations and isFloatingPointType/isMinPrecisionType traits - doValuesMatch overloads: min16float compares in half-precision space (reuses CompareHalfULP/CompareHalfEpsilon), integers use exact match - TrigonometricValidation specializations matching HLSLHalf_t tolerances Part of: microsoft#7780 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add DxilConf_SM69_Vectorized_MinPrecision test class with HLK_TEST_MINP and HLK_WAVEOP_TEST_MINP macros. Mirrors 16-bit counterpart coverage (HLSLHalf_t/int16_t/uint16_t) minus documented exclusions. - New test class with Kits.Specification = Device.Graphics.D3D12.DXILCore.ShaderModel69.MinPrecision - setupClass skips when device lacks min precision support - ~160 test entries across 3 types (min16float/min16int/min16uint) - MakeDifferent overloads in ShaderOpArith.xml (not gated by __HLSL_ENABLE_16_BIT since min precision is always available) - Excluded: FP specials, AsType, Cast, bit-manipulation ops Part of: microsoft#7780 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make all but one conversion operator explicit per wrapper type to avoid C2666 ambiguity with built-in arithmetic operators. Matches HLSLHalf_t pattern: one implicit conversion to the natural type (float/int32_t/uint32_t), all others explicit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The implicit operator float() combined with implicit constructors from int/uint32_t created ambiguity for expressions like 'A + 4': the compiler could not choose between member operator+(HLSLMin16Float_t) via constructor and built-in float+int via conversion. Making the int/uint constructors explicit eliminates the member operator+ path for int literals while preserving T(0) direct construction and implicit float conversion for std:: math functions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
HLSLMin16Int_t: add uint32_t and uint64_t constructors for static_cast<T>(UINT) and static_cast<T>(size_t) patterns used in shift masking and wave ops. Add operator~() for bitwise NOT in WaveMultiPrefixBit ops. HLSLMin16Uint_t: add operator~() for the same reason. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
UnaryMathAbs: extend unsigned check to include HLSLMin16Uint_t (std::is_unsigned_v is false for class types, so abs was called with ambiguous overloads via implicit operator uint32_t). MaskShiftAmount: change constexpr to const since wrapper types are not literal types (no constexpr constructors). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ~27 Cast test entries (CastToBool, CastToInt16, CastToInt32, CastToInt64, CastToUint16/32/64, CastToFloat16/32) for all three min precision types. The generic Cast templates work via the single implicit conversion operator on each wrapper type — C-style casts chain through it (e.g. (int32_t)min16float goes float->int32_t). Remove explicit conversion operators (operator double, operator int32_t, etc.) that were not exercised since Cast tests were not previously included and no other code paths use them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WaveMultiPrefixBitAnd/BitOr/BitXor use the any_int type set (g_AnyIntCT)
which is defined as {int16, int32, int64, uint16, uint32, uint64} and does
not include min precision integer types (min16int, min16uint). Remove the 6
invalid test entries and the now-unused operator~() from both integer
wrapper types.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move all min precision test entries (min16float, min16int, min16uint) from the separate DxilConf_SM69_Vectorized_MinPrecision class into DxilConf_SM69_Vectorized_Core, using HLK_TEST/HLK_WAVEOP_TEST macros. Remove the HLK_TEST_MINP and HLK_WAVEOP_TEST_MINP macro definitions, the DxilConf_SM69_Vectorized_MinPrecision class, and the doesDeviceSupportMinPrecision utility function since min precision support checking is not required. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Min precision types (min16float, min16int, min16uint) are hints that allow hardware to use any precision >= the specified minimum, making buffer storage width implementation-defined. Add IO_TYPE/IO_OUT_TYPE compiler defines that map min precision types to their full-precision equivalents (float, int, uint) for buffer Load/Store operations. For all other types, IO_TYPE equals TYPE and IO_OUT_TYPE equals OUT_TYPE. This ensures deterministic buffer data layout regardless of the device's min precision implementation, while still testing min precision computation via explicit casts between the I/O types and the min precision types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
HLSL does not support signed integer division on minimum-precision types. The compiler rejects these with: 'signed integer division is not supported on minimum-precision types, cast to int to use 32-bit division'. Remove the Divide and Modulus test entries for HLSLMin16Int_t. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace min16float input values that are not exactly representable in float16 with values that are. This avoids precision mismatches between CPU-side expected value computation (float32) and GPU-side min precision results, where the cast to min16float rounds values to the nearest float16 representation. Key changes: - Default1: -0.01f -> -0.03125f (exact power-of-2 fraction) - Positive: 0.01f -> 0.03125f, 5531.0f -> 5504.0f, 331.233f -> 331.25f, 3250.01f -> 3250.0f - RangeHalfPi/RangeOne: replaced with float16-exact fractions covering the same ranges Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adjust input values so that arithmetic results (multiply, mad, subtract, left-shift, wave prefix products) do not overflow 16-bit integer range. Min precision types compute at >= 16 bits, so results that overflow at 16 bits differ from the 32-bit expected values. min16uint changes: - Default1: reduced large values (699->199, 1023->200) so products and wave prefix products fit in uint16 - Default1: ensured all values >= Default2 to avoid subtract underflow (1->3, 6->10, 0->22) - BitShiftRhs: reduced large shifts (13->12, 14->12, 15->12) so shifted values fit in uint16 min16int changes: - BitShiftRhs: reduced large shifts (13->11, 14->11, 15->14) so shifted values fit in int16 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wave and quad intrinsics (WaveReadLaneAt, WaveReadLaneFirst, WaveActiveSum/Min/Max/Product/AllEqual, WavePrefixSum/Product, WaveMultiPrefixSum/Product, WaveMatch, QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) do not support min precision types (min16float, min16int, min16uint). The DXIL wave/quad shuffle operations operate on 32-bit or 64-bit register slots and do not handle 16-bit min precision payloads. Removes 48 test entries (16 per min precision type) and adds explanatory comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The dot product tolerance computation was using float32 ULPs for HLSLMin16Float_t, but the GPU may compute at float16 precision. With NUM=256 elements the accumulated error exceeds the float32-based epsilon. Use HLSLHalf_t::GetULP to compute half-precision ULPs for min16float, matching the approach already used for HLSLHalf_t. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
You can test this locally with the following command:git-clang-format --diff f62b9b4be05f76adb41134e3abd9323be71e68f4 cb6175aa8b8258ce062fed1677f474b9f9e02a50 -- tools/clang/unittests/HLSLExec/LongVectorTestData.h tools/clang/unittests/HLSLExec/LongVectors.cppView the diff from clang-format here.diff --git a/tools/clang/unittests/HLSLExec/LongVectors.cpp b/tools/clang/unittests/HLSLExec/LongVectors.cpp
index 28fba8d0..ed46e4b3 100644
--- a/tools/clang/unittests/HLSLExec/LongVectors.cpp
+++ b/tools/clang/unittests/HLSLExec/LongVectors.cpp
@@ -1889,7 +1889,8 @@ void dispatchMinPrecisionTest(ID3D12Device *D3DDevice, bool VerboseLogging,
Op<OP, T, Operation.Arity> Op;
// Min precision buffer storage width is implementation-defined, so we use
- // full-precision types for Load/Store via BUFFER_TYPE/BUFFER_OUT_TYPE defines.
+ // full-precision types for Load/Store via BUFFER_TYPE/BUFFER_OUT_TYPE
+ // defines.
for (size_t VectorSize : InputVectorSizes) {
std::vector<std::vector<T>> Inputs =
buildTestInputs<T>(VectorSize, Operation.InputSets, Operation.Arity);
@@ -1920,13 +1921,13 @@ void dispatchMinPrecisionWaveOpTest(ID3D12Device *D3DDevice,
Op<OP, T, Operation.Arity> Op;
// Min precision buffer storage width is implementation-defined, so we use
- // full-precision types for Load/Store via BUFFER_TYPE/BUFFER_OUT_TYPE defines.
+ // full-precision types for Load/Store via BUFFER_TYPE/BUFFER_OUT_TYPE
+ // defines.
for (size_t VectorSize : InputVectorSizes) {
std::vector<std::vector<T>> Inputs =
buildTestInputs<T>(VectorSize, Operation.InputSets, Operation.Arity);
- auto Expected =
- ExpectedBuilder<OP, T>::buildExpected(Op, Inputs, WaveSize);
+ auto Expected = ExpectedBuilder<OP, T>::buildExpected(Op, Inputs, WaveSize);
using OutT = typename decltype(Expected)::value_type;
@@ -3020,7 +3021,6 @@ public:
HLK_MIN_PRECISION_TEST(Or, HLSLMin16Int_t);
HLK_MIN_PRECISION_TEST(Xor, HLSLMin16Int_t);
-
// UnaryMath
HLK_MIN_PRECISION_TEST(Abs, HLSLMin16Int_t);
HLK_MIN_PRECISION_TEST(Sign, HLSLMin16Int_t);
@@ -3116,7 +3116,6 @@ public:
HLK_MIN_PRECISION_TEST(Or, HLSLMin16Uint_t);
HLK_MIN_PRECISION_TEST(Xor, HLSLMin16Uint_t);
-
// UnaryMath
HLK_MIN_PRECISION_TEST(Abs, HLSLMin16Uint_t);
HLK_MIN_PRECISION_TEST(Sign, HLSLMin16Uint_t);
|
c6a3918 to
d8cfc9e
Compare
Cast operations have different input and output types (e.g. min16float input with int32 output), so each side needs its own IO type mapping. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Enables constexpr on MaskShiftAmount's ShiftMask local variable, restoring the original constexpr qualifier that was downgraded to const. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Discussion: Shift operator semantics for min precision integer typesWhile working on these tests, I investigated how How shifts are implementedShift operators aren't intrinsics — they're built-in binary operators handled through Clang's Sema/CodeGen layers:
The ambiguity
However, in min precision mode (
For left shift, this is arguably fine — bits above 15 are "extra precision." For right shift, it's murkier — if the 32-bit register has bits set above bit 15 from prior operations, a right shift masked to Compare to Test data is safeThe Existing test coverage
Questions for the team
|
These input sets are not referenced by any test entry since wave ops are excluded for min precision types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Only define -DMIN_PRECISION, -DIO_TYPE, and -DIO_OUT_TYPE for min precision test types. Shader templates use #ifdef MIN_PRECISION to gate the load-with-cast paths, leaving non-min-precision shaders completely unchanged. Fallback #ifndef defines ensure IO_OUT_TYPE resolves to OUT_TYPE for Store calls when MIN_PRECISION is not set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Wave, quad, splitdouble, frexp, and modf stores will never be reached by min precision test types, so IO_OUT_TYPE is unnecessary. Keep it only for the final main store and derivative stores which are exercised by min precision tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move min precision IO_TYPE/IO_OUT_TYPE/MIN_PRECISION defines from getCompilerOptionsString into a dedicated dispatchMinPrecisionTest function that passes them via AdditionalCompilerOptions. This matches the existing pattern used by dispatchWaveOpTest and keeps the shared compiler options builder clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use if constexpr on the C++ type instead of strcmp on HLSL type strings. Cleaner and resolved at compile time. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add IOTypeString field to DataType and MIN_PRECISION_DATA_TYPE macro. Remove standalone getIOTypeString template function. The IO type mapping now lives alongside the other type metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bit shifting is not supported for min precision data types (min16int, min16uint). Remove LeftShift/RightShift test entries and their associated BitShiftRhs input data sets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add wave and quad op test entries for min16float, min16int, and min16uint long vector types, mirroring the ops supported by their 16-bit equivalents. Test entries added: - Min16Float: 12 wave ops + 4 quad ops - Min16Int: 15 wave ops (includes WaveMultiPrefixBit*) + 4 quad ops - Min16Uint: 15 wave ops (includes WaveMultiPrefixBit*) + 4 quad ops Infrastructure changes: - ShaderOpArith.xml: Add #ifdef MIN_PRECISION guards to 7 wave op Store calls so they use IO_OUT_TYPE for min precision buffer I/O while keeping OUT_TYPE for standard types (no DXIL change for existing tests) - LongVectors.cpp: Add dispatchMinPrecisionWaveOpTest combining WaveSize with IO_TYPE/MIN_PRECISION compiler options, and HLK_MIN_PRECISION_WAVEOP_TEST macro - LongVectorTestData.h: Add WaveMultiPrefixBitwise input sets for HLSLMin16Int_t and HLSLMin16Uint_t - Fix ambiguous operator& in waveMultiPrefixBitOr for min precision types Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename the shader defines and C++ field to better communicate their purpose: these specify the full-precision buffer storage type for Load/Store operations, not a generic I/O type. - IO_TYPE -> BUFFER_TYPE - IO_OUT_TYPE -> BUFFER_OUT_TYPE - IOTypeString -> BufferTypeString Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract duplicated WaveSize computation from runWaveOpTest and runMinPrecisionWaveOpTest into a shared getWaveSize() method. Update section comments to say 'mirrors applicable ops' since not all ops from the 16-bit equivalent types are supported for min precision. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This PR extends the SM 6.9 long vector execution tests to cover HLSL min precision types (min16float, min16int, min16uint). These types are always available —
D3D12_SHADER_MIN_PRECISION_SUPPORTonly reports whether hardware actually uses reduced precision, not whether the types compile — so no device capability check is needed and the tests live in the existingDxilConf_SM69_Vectorized_Coreclass alongside other types.Note: I wasn't able to find any existing min precision HLK tests. Unclear if we have coverage.
Key design decisions
Full-precision buffer I/O: Min precision types have implementation-defined buffer storage width, so we use full-precision types (
float/int/uint) for all Load/Store operations via theIO_TYPE/IO_OUT_TYPEshader defines, with explicit casts to/from the min precision compute type. This ensures deterministic data layout regardless of the device implementation.Half-precision tolerances: Validation compares results in fp16 space using HLSLHalf_t ULP tolerances. Since min precision guarantees at least 16-bit, fp16 tolerances are a correct upper bound — devices computing at higher precision will produce more accurate results, not less.
Test coverage mirrors existing patterns:
Wave and quad op support: Wave ops (WaveActiveSum/Min/Max/Product/AllEqual, WaveReadLaneAt/First, WavePrefix*, WaveMultiPrefix*, WaveMatch) and quad ops (QuadReadLaneAt, QuadReadAcrossX/Y/Diagonal) are tested for all three min precision types, mirroring the ops supported by their 16-bit equivalents. The wave op shader helpers use
#ifdef MIN_PRECISIONguards to store results viaIO_OUT_TYPEfor deterministic buffer layout without changing DXIL for existing non-min-precision tests.Excluded operations:
Resolves #7780
The array accessor and wave/quad op tests for min precision require the optimizer fix from: #8269
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com