Skip to content

[HLSL] Add GroupWaveIndex/Count execution test#8277

Open
JoeCitizen wants to merge 1 commit intomicrosoft:mainfrom
JoeCitizen:user/jackell/GroupWaveIndex_HLK
Open

[HLSL] Add GroupWaveIndex/Count execution test#8277
JoeCitizen wants to merge 1 commit intomicrosoft:mainfrom
JoeCitizen:user/jackell/GroupWaveIndex_HLK

Conversation

@JoeCitizen
Copy link
Collaborator

Add an HLK execution test for the SM 6.10 GetGroupWaveIndex() and GetGroupWaveCount() intrinsics per the spec at
https://microsoft.github.io/hlsl-specs/proposals/0048-group-wave-index/.

The test dispatches a compute shader that writes per-thread wave index, wave count, lane index, lane count, and first-lane group index to a UAV buffer, then verifies:

  • GetGroupWaveCount() is uniform across all threads in the group
  • GetGroupWaveCount() >= ceil(threadGroupSize / WaveGetLaneCount())
  • GetGroupWaveIndex() is in range [0, waveCount)
  • GetGroupWaveIndex() is uniform within each wave
  • Each wave has a distinct index covering [0, waveCount)

Test configurations cover:

  • Multiple thread group sizes: 8, 64, 256, 1024 threads
  • 1D, 2D, and 3D thread group dimensions
  • Non-power-of-2 thread group size
  • WaveSize attribute interaction for each supported wave size
  • Single-wave edge case (numthreads <= waveSize)

Add an HLK execution test for the SM 6.10 GetGroupWaveIndex() and
GetGroupWaveCount() intrinsics per the spec at
https://microsoft.github.io/hlsl-specs/proposals/0048-group-wave-index/.

The test dispatches a compute shader that writes per-thread wave index,
wave count, lane index, lane count, and first-lane group index to a UAV
buffer, then verifies:
- GetGroupWaveCount() is uniform across all threads in the group
- GetGroupWaveCount() >= ceil(threadGroupSize / WaveGetLaneCount())
- GetGroupWaveIndex() is in range [0, waveCount)
- GetGroupWaveIndex() is uniform within each wave
- Each wave has a distinct index covering [0, waveCount)

Test configurations cover:
- Multiple thread group sizes: 8, 64, 256, 1024 threads
- 1D, 2D, and 3D thread group dimensions
- Non-power-of-2 thread group size
- WaveSize attribute interaction for each supported wave size
- Single-wave edge case (numthreads <= waveSize)

Also adds D3D_SHADER_MODEL_6_10 local definition to HlslExecTestUtils.h
since the released Windows SDK only defines up to SM 6.9.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment on lines +10956 to +10960
if (!doesDeviceSupportWaveOps(Device)) {
WEX::Logging::Log::Comment(L"Device does not support wave operations.");
WEX::Logging::Log::Result(WEX::Logging::TestResults::Skipped);
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't ever happen - if it does we should fail the test, but I don't think the runtime would let you create the device. WaveOps support is required as of SM 6.9.

Comment on lines +10971 to +10977
uint32_t groupIndex;
uint32_t waveIndex;
uint32_t waveCount;
uint32_t laneIndex;
uint32_t laneCount;
uint32_t firstLaneGroupIndex;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uint32_t groupIndex;
uint32_t waveIndex;
uint32_t waveCount;
uint32_t laneIndex;
uint32_t laneCount;
uint32_t firstLaneGroupIndex;
};
uint32_t GroupIndex;
uint32_t WaveIndex;
uint32_t WaveCount;
uint32_t LaneIndex;
uint32_t LaneCount;
uint32_t FirstLaneGroupIndex;
};

// WaveSize attribute, injected via compiler -D options.
const char Shader[] =
R"(struct GroupWaveData {
uint groupIndex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use LLVM naming conventions in execution test shader code too.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new HLSL execution test validating SM 6.10 GetGroupWaveIndex() / GetGroupWaveCount() behavior across multiple threadgroup shapes and optional [WaveSize] configurations.

Changes:

  • Adds a new GroupWaveIndexTest ShaderOp definition with a readback UAV buffer.
  • Implements ExecutionTest::GroupWaveIndexTest() which compiles a parameterized compute shader and validates wave invariants from readback.
  • Expands execution test registration to include the new test method.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
tools/clang/unittests/HLSLExec/ShaderOpArith.xml Adds a new ShaderOp + UAV resource to support the new wave-index/count execution test.
tools/clang/unittests/HLSLExec/ExecutionTest.cpp Implements and registers GroupWaveIndexTest, including shader generation, dispatch execution, and result verification.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1918 to +1927
<ShaderOp Name="GroupWaveIndexTest" CS="CS">
<RootSignature>RootFlags(0), UAV(u0)</RootSignature>
<Resource Name="UAVBuffer0" Dimension="BUFFER" Width="32768" InitialResourceState="COPY_DEST" Init="ByName" Flags="ALLOW_UNORDERED_ACCESS" TransitionTo="UNORDERED_ACCESS" ReadBack="true" Format="R32_TYPELESS" />
<RootValues>
<RootValue Index="0" ResName="UAVBuffer0" />
</RootValues>
<Shader Name="CS" Target="cs_6_10">
<![CDATA[// Shader source code will be set at runtime]]>
</Shader>
</ShaderOp>
<RootValues>
<RootValue Index="0" ResName="UAVBuffer0" />
</RootValues>
<Shader Name="CS" Target="cs_6_10">
Comment on lines +10963 to +10966
D3D12_FEATURE_DATA_D3D12_OPTIONS1 WaveOpts;
VERIFY_SUCCEEDED(
Device->CheckFeatureSupport((D3D12_FEATURE)D3D12_FEATURE_D3D12_OPTIONS1,
&WaveOpts, sizeof(WaveOpts)));
Comment on lines +11070 to +11071
GroupWaveData *InData = (GroupWaveData *)Data.data();
memset(InData, 0, sizeof(GroupWaveData) * NumThreads);
MappedData DataUav;
Test->Test->GetReadBackData("UAVBuffer0", &DataUav);
VERIFY_IS_TRUE(sizeof(GroupWaveData) * NumThreads <= DataUav.size());
const GroupWaveData *Results = (const GroupWaveData *)DataUav.data();
st::RunShaderOpTestAfterParse(
Device, m_support, "GroupWaveIndexTest",
[&](LPCSTR Name, std::vector<BYTE> &Data, st::ShaderOp *ShaderOp) {
VERIFY_IS_TRUE((0 == strncmp(Name, "UAVBuffer0", 10)));
Comment on lines +11087 to +11092
// Verify waveCount >= ceil(threadGroupSize / laneCount) per spec.
uint32_t LaneCount = Results[0].laneCount;
uint32_t MinWaves = (NumThreads + LaneCount - 1) / LaneCount;
LogCommentFmt(L" waveCount=%u, laneCount=%u, minWaves=%u", WaveCount,
LaneCount, MinWaves);
VERIFY_IS_GREATER_THAN_OR_EQUAL(WaveCount, MinWaves);
Copy link
Contributor

@alsepkow alsepkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments. Reviewed with @inbelic

TEST_METHOD(GroupSharedLimitTest);
TEST_METHOD(GroupSharedLimitASTest);
TEST_METHOD(GroupSharedLimitMSTest);
TEST_METHOD(GroupWaveIndexTest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to need to decorate the new 6.10 test methods with an HLK requirement (Kits.Specification) and a GUID (Kits.TestId) to make these actually show up in the HLK. I alluded to that on the last PR but it would be good to make sure we at least track that.

Also, we still need to add a requirements xml to the OS for the HLK requirement to actually exist.

Heres an example of a decorated test method in the LongVector tests.

Test can have requirement and ID at the class level if all the tests in the class should map to that requirement. If we're going to add these 6.10 tests to ExecutionTests.cpp then we'll either want to do the per test method option or add in a sub test class for them.

The GUID is the thing that the HLK infra will use for 'registering' the test and executing it. So, if a GUID were to say map to a test class then those tests would be deployed on a sing machine to run in the HLK environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we want these in the HLK :)

const GroupWaveData *Results = (const GroupWaveData *)DataUav.data();

// Verify waveCount is uniform across all threads and >= 1.
uint32_t WaveCount = Results[0].waveCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uint32_t WaveCount = Results[0].waveCount;
const uint32_t WaveCount = Results[0].waveCount;

Comment on lines +11088 to +11089
uint32_t LaneCount = Results[0].laneCount;
uint32_t MinWaves = (NumThreads + LaneCount - 1) / LaneCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uint32_t LaneCount = Results[0].laneCount;
uint32_t MinWaves = (NumThreads + LaneCount - 1) / LaneCount;
const uint32_t LaneCount = Results[0].laneCount;
const uint32_t MinWaves = (NumThreads + LaneCount - 1) / LaneCount;


// Verify waveCount is uniform across all threads and >= 1.
uint32_t WaveCount = Results[0].waveCount;
VERIFY_IS_GREATER_THAN_OR_EQUAL(WaveCount, (uint32_t)1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VERIFY_IS_GREATER_THAN_OR_EQUAL(WaveCount, (uint32_t)1);
VERIFY_IS_GREATER_THAN_OR_EQUAL(WaveCount, 1u);

std::set<uint32_t> SeenWaveIndices;
for (auto &WavePair : Waves) {
const std::vector<const GroupWaveData *> &Lanes = WavePair.second;
VERIFY_IS_GREATER_THAN_OR_EQUAL(Lanes.size(), (size_t)1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VERIFY_IS_GREATER_THAN_OR_EQUAL(Lanes.size(), (size_t)1);
VERIFY_IS_GREATER_THAN_OR_EQUAL(Lanes.size(), 1u);

}

// Verify all wave indices from 0 to waveCount-1 are present.
VERIFY_ARE_EQUAL((uint32_t)SeenWaveIndices.size(), WaveCount);
Copy link
Contributor

@alsepkow alsepkow Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VERIFY_ARE_EQUAL((uint32_t)SeenWaveIndices.size(), WaveCount);
VERIFY_ARE_EQUAL(SeenWaveIndices.size(), static_cast<size_t>(WaveCount));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is suggesting what you want to suggest:

  • the parens aren't balanced
  • if we're going to cast, then we should cast them to the same type - size() is size_t.

Maybe you wanted to suggest:

VERIFY_ARE_EQUAL(SeenWaveIndices.size(), static_cast<size_t>(WaveCount));

}

// Verify number of distinct waves matches waveCount.
VERIFY_ARE_EQUAL((uint32_t)Waves.size(), WaveCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VERIFY_ARE_EQUAL((uint32_t)Waves.size(), WaveCount);
VERIFY_ARE_EQUAL(Waves.size(), static_cast<size_t>(WaveCount));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New

Development

Successfully merging this pull request may close these issues.

4 participants