Skip quadBroadcast/quadSwap split tests when subgroup_size < 8#4656
Skip quadBroadcast/quadSwap split tests when subgroup_size < 8#4656gyagp wants to merge 2 commits into
Conversation
| // 4-thread workgroup (as WARP does). With subgroup size 4 and a | ||
| // split predicate, only 2 lanes remain active in the only quad, | ||
| // leaving no fully active quad (undefined behavior). | ||
| if (wgThreads === 4 && t.device.adapterInfo.subgroupMinSize <= 4) { |
There was a problem hiding this comment.
Worth also skipping unconditionally if max subgroup size is 4, since that would split every quad as well? Though, WARP still only chooses native wave sizes without explicit requests to go wider, so on arm64 systems, I think it would pick subgroup size 4 since it only supports NEON (and even with SVE there's no real systems that support vector length > 128 bit).
So I think this might need to be a skip from inspecting the subgroup size as reported by running the shader itself, rather than a pre-emptive skip.
There was a problem hiding this comment.
That's simple enough as we already capture that data in the shader.
When the implementation selects a subgroup size < 8 for the test's
workgroup, the split predicate `id < subgroupSize / 2` bisects the
only quad in the subgroup, leaving no fully active quad — which is
undefined behavior for quad operations. This is observed on WARP
(which selects its native D3D12 wave size: 4 on arm64 NEON, often 4
on x86 for small workgroups) and may occur on any implementation
that picks a small native subgroup size at runtime.
Two coordinated guards:
* In the shader, the quad call is wrapped in
`if subgroupSize >= 8u { ... }` so it never executes when the
split predicate would be unsafe.
* In the JS checker, the actual subgroupSize is read out of
metadata.subgroup_size[0] and the test is skipped with t.skip
when it is < 8, so the missing output doesn't get flagged as a
failure.
Querying GPUAdapterInfo.subgroupMinSize would not be sufficient: the
size the implementation actually selects depends on the shader (its
workgroup size, register pressure, etc.), not just the adapter's
minimum supported size. Reading subgroupSize from inside the test
shader itself is the only reliable signal.
|
I changed to use the reported subgroup size instead. Below are the test results on various situations.
All three report adapter.subgroupMinSize=4 (WARP) or 32 (NVIDIA), but the runtime size for the same shader differs across: |
The compute,split tests use the predicate
id < subgroupSize / 2, which deactivates the upper half of each subgroup. When subgroupSize is 8 or greater the boundary lands on a 4-lane multiple, so every quad stays fully active and the test exercises legitimate predicated quad operations. When subgroupSize is 4, the only quad gets bisected, leaving no fully active quad — calling quadBroadcast/quadSwap there is undefined behavior and the result is not meaningful to validate.GPUAdapterInfo.subgroupMinSize is not a sufficient pre-check: the size the implementation actually selects depends on the compiled shader (workgroup shape, register pressure, vendor heuristics), not just the adapter's minimum. Two implementations from the same vendor can report identical {min,max} and pick different runtime sizes for the same shader. Reading @Builtin(subgroup_size) from inside the test shader is the only reliable signal.
Issue: #4650
Requirements for PR author:
All missing test coverage is tracked with "TODO" or.unimplemented().New helpers are/** documented */and new helper files are found inhelper_index.txt.Test have be tested with compatibility mode validation enabled and behave as expected. (If not passing, explain above.)Requirements for reviewer sign-off:
When landing this PR, be sure to make any necessary issue status updates.