[HLSL] Add GroupWaveIndex/Count execution test#8277
[HLSL] Add GroupWaveIndex/Count execution test#8277JoeCitizen wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Add an HLK execution test for the SM 6.10 GetGroupWaveIndex() and GetGroupWaveCount() intrinsics per the spec at https://microsoft.github.io/hlsl-specs/proposals/0048-group-wave-index/. The test dispatches a compute shader that writes per-thread wave index, wave count, lane index, lane count, and first-lane group index to a UAV buffer, then verifies: - GetGroupWaveCount() is uniform across all threads in the group - GetGroupWaveCount() >= ceil(threadGroupSize / WaveGetLaneCount()) - GetGroupWaveIndex() is in range [0, waveCount) - GetGroupWaveIndex() is uniform within each wave - Each wave has a distinct index covering [0, waveCount) Test configurations cover: - Multiple thread group sizes: 8, 64, 256, 1024 threads - 1D, 2D, and 3D thread group dimensions - Non-power-of-2 thread group size - WaveSize attribute interaction for each supported wave size - Single-wave edge case (numthreads <= waveSize) Also adds D3D_SHADER_MODEL_6_10 local definition to HlslExecTestUtils.h since the released Windows SDK only defines up to SM 6.9. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| if (!doesDeviceSupportWaveOps(Device)) { | ||
| WEX::Logging::Log::Comment(L"Device does not support wave operations."); | ||
| WEX::Logging::Log::Result(WEX::Logging::TestResults::Skipped); | ||
| return; | ||
| } |
There was a problem hiding this comment.
This shouldn't ever happen - if it does we should fail the test, but I don't think the runtime would let you create the device. WaveOps support is required as of SM 6.9.
| uint32_t groupIndex; | ||
| uint32_t waveIndex; | ||
| uint32_t waveCount; | ||
| uint32_t laneIndex; | ||
| uint32_t laneCount; | ||
| uint32_t firstLaneGroupIndex; | ||
| }; |
There was a problem hiding this comment.
| uint32_t groupIndex; | |
| uint32_t waveIndex; | |
| uint32_t waveCount; | |
| uint32_t laneIndex; | |
| uint32_t laneCount; | |
| uint32_t firstLaneGroupIndex; | |
| }; | |
| uint32_t GroupIndex; | |
| uint32_t WaveIndex; | |
| uint32_t WaveCount; | |
| uint32_t LaneIndex; | |
| uint32_t LaneCount; | |
| uint32_t FirstLaneGroupIndex; | |
| }; |
| // WaveSize attribute, injected via compiler -D options. | ||
| const char Shader[] = | ||
| R"(struct GroupWaveData { | ||
| uint groupIndex; |
There was a problem hiding this comment.
We should use LLVM naming conventions in execution test shader code too.
There was a problem hiding this comment.
Pull request overview
Adds a new HLSL execution test validating SM 6.10 GetGroupWaveIndex() / GetGroupWaveCount() behavior across multiple threadgroup shapes and optional [WaveSize] configurations.
Changes:
- Adds a new
GroupWaveIndexTestShaderOp definition with a readback UAV buffer. - Implements
ExecutionTest::GroupWaveIndexTest()which compiles a parameterized compute shader and validates wave invariants from readback. - Expands execution test registration to include the new test method.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| tools/clang/unittests/HLSLExec/ShaderOpArith.xml | Adds a new ShaderOp + UAV resource to support the new wave-index/count execution test. |
| tools/clang/unittests/HLSLExec/ExecutionTest.cpp | Implements and registers GroupWaveIndexTest, including shader generation, dispatch execution, and result verification. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| <ShaderOp Name="GroupWaveIndexTest" CS="CS"> | ||
| <RootSignature>RootFlags(0), UAV(u0)</RootSignature> | ||
| <Resource Name="UAVBuffer0" Dimension="BUFFER" Width="32768" InitialResourceState="COPY_DEST" Init="ByName" Flags="ALLOW_UNORDERED_ACCESS" TransitionTo="UNORDERED_ACCESS" ReadBack="true" Format="R32_TYPELESS" /> | ||
| <RootValues> | ||
| <RootValue Index="0" ResName="UAVBuffer0" /> | ||
| </RootValues> | ||
| <Shader Name="CS" Target="cs_6_10"> | ||
| <![CDATA[// Shader source code will be set at runtime]]> | ||
| </Shader> | ||
| </ShaderOp> |
| <RootValues> | ||
| <RootValue Index="0" ResName="UAVBuffer0" /> | ||
| </RootValues> | ||
| <Shader Name="CS" Target="cs_6_10"> |
| D3D12_FEATURE_DATA_D3D12_OPTIONS1 WaveOpts; | ||
| VERIFY_SUCCEEDED( | ||
| Device->CheckFeatureSupport((D3D12_FEATURE)D3D12_FEATURE_D3D12_OPTIONS1, | ||
| &WaveOpts, sizeof(WaveOpts))); |
| GroupWaveData *InData = (GroupWaveData *)Data.data(); | ||
| memset(InData, 0, sizeof(GroupWaveData) * NumThreads); |
| MappedData DataUav; | ||
| Test->Test->GetReadBackData("UAVBuffer0", &DataUav); | ||
| VERIFY_IS_TRUE(sizeof(GroupWaveData) * NumThreads <= DataUav.size()); | ||
| const GroupWaveData *Results = (const GroupWaveData *)DataUav.data(); |
| st::RunShaderOpTestAfterParse( | ||
| Device, m_support, "GroupWaveIndexTest", | ||
| [&](LPCSTR Name, std::vector<BYTE> &Data, st::ShaderOp *ShaderOp) { | ||
| VERIFY_IS_TRUE((0 == strncmp(Name, "UAVBuffer0", 10))); |
| // Verify waveCount >= ceil(threadGroupSize / laneCount) per spec. | ||
| uint32_t LaneCount = Results[0].laneCount; | ||
| uint32_t MinWaves = (NumThreads + LaneCount - 1) / LaneCount; | ||
| LogCommentFmt(L" waveCount=%u, laneCount=%u, minWaves=%u", WaveCount, | ||
| LaneCount, MinWaves); | ||
| VERIFY_IS_GREATER_THAN_OR_EQUAL(WaveCount, MinWaves); |
| TEST_METHOD(GroupSharedLimitTest); | ||
| TEST_METHOD(GroupSharedLimitASTest); | ||
| TEST_METHOD(GroupSharedLimitMSTest); | ||
| TEST_METHOD(GroupWaveIndexTest); |
There was a problem hiding this comment.
You're going to need to decorate the new 6.10 test methods with an HLK requirement (Kits.Specification) and a GUID (Kits.TestId) to make these actually show up in the HLK. I alluded to that on the last PR but it would be good to make sure we at least track that.
Also, we still need to add a requirements xml to the OS for the HLK requirement to actually exist.
Heres an example of a decorated test method in the LongVector tests.
Test can have requirement and ID at the class level if all the tests in the class should map to that requirement. If we're going to add these 6.10 tests to ExecutionTests.cpp then we'll either want to do the per test method option or add in a sub test class for them.
The GUID is the thing that the HLK infra will use for 'registering' the test and executing it. So, if a GUID were to say map to a test class then those tests would be deployed on a sing machine to run in the HLK environment.
There was a problem hiding this comment.
I'm assuming we want these in the HLK :)
| const GroupWaveData *Results = (const GroupWaveData *)DataUav.data(); | ||
|
|
||
| // Verify waveCount is uniform across all threads and >= 1. | ||
| uint32_t WaveCount = Results[0].waveCount; |
There was a problem hiding this comment.
| uint32_t WaveCount = Results[0].waveCount; | |
| const uint32_t WaveCount = Results[0].waveCount; |
| uint32_t LaneCount = Results[0].laneCount; | ||
| uint32_t MinWaves = (NumThreads + LaneCount - 1) / LaneCount; |
There was a problem hiding this comment.
| uint32_t LaneCount = Results[0].laneCount; | |
| uint32_t MinWaves = (NumThreads + LaneCount - 1) / LaneCount; | |
| const uint32_t LaneCount = Results[0].laneCount; | |
| const uint32_t MinWaves = (NumThreads + LaneCount - 1) / LaneCount; |
|
|
||
| // Verify waveCount is uniform across all threads and >= 1. | ||
| uint32_t WaveCount = Results[0].waveCount; | ||
| VERIFY_IS_GREATER_THAN_OR_EQUAL(WaveCount, (uint32_t)1); |
There was a problem hiding this comment.
| VERIFY_IS_GREATER_THAN_OR_EQUAL(WaveCount, (uint32_t)1); | |
| VERIFY_IS_GREATER_THAN_OR_EQUAL(WaveCount, 1u); |
| std::set<uint32_t> SeenWaveIndices; | ||
| for (auto &WavePair : Waves) { | ||
| const std::vector<const GroupWaveData *> &Lanes = WavePair.second; | ||
| VERIFY_IS_GREATER_THAN_OR_EQUAL(Lanes.size(), (size_t)1); |
There was a problem hiding this comment.
| VERIFY_IS_GREATER_THAN_OR_EQUAL(Lanes.size(), (size_t)1); | |
| VERIFY_IS_GREATER_THAN_OR_EQUAL(Lanes.size(), 1u); |
| } | ||
|
|
||
| // Verify all wave indices from 0 to waveCount-1 are present. | ||
| VERIFY_ARE_EQUAL((uint32_t)SeenWaveIndices.size(), WaveCount); |
There was a problem hiding this comment.
| VERIFY_ARE_EQUAL((uint32_t)SeenWaveIndices.size(), WaveCount); | |
| VERIFY_ARE_EQUAL(SeenWaveIndices.size(), static_cast<size_t>(WaveCount)); |
There was a problem hiding this comment.
I don't think this is suggesting what you want to suggest:
- the parens aren't balanced
- if we're going to cast, then we should cast them to the same type -
size()issize_t.
Maybe you wanted to suggest:
VERIFY_ARE_EQUAL(SeenWaveIndices.size(), static_cast<size_t>(WaveCount));| } | ||
|
|
||
| // Verify number of distinct waves matches waveCount. | ||
| VERIFY_ARE_EQUAL((uint32_t)Waves.size(), WaveCount); |
There was a problem hiding this comment.
| VERIFY_ARE_EQUAL((uint32_t)Waves.size(), WaveCount); | |
| VERIFY_ARE_EQUAL(Waves.size(), static_cast<size_t>(WaveCount)); |
Add an HLK execution test for the SM 6.10 GetGroupWaveIndex() and GetGroupWaveCount() intrinsics per the spec at
https://microsoft.github.io/hlsl-specs/proposals/0048-group-wave-index/.
The test dispatches a compute shader that writes per-thread wave index, wave count, lane index, lane count, and first-lane group index to a UAV buffer, then verifies:
Test configurations cover: