Skip to content

Skip quadBroadcast/quadSwap split tests when subgroup_size < 8#4656

Open
gyagp wants to merge 2 commits into
gpuweb:mainfrom
gyagp:cts-quadops-warp
Open

Skip quadBroadcast/quadSwap split tests when subgroup_size < 8#4656
gyagp wants to merge 2 commits into
gpuweb:mainfrom
gyagp:cts-quadops-warp

Conversation

@gyagp
Copy link
Copy Markdown

@gyagp gyagp commented Jun 2, 2026

The compute,split tests use the predicate id < subgroupSize / 2, which deactivates the upper half of each subgroup. When subgroupSize is 8 or greater the boundary lands on a 4-lane multiple, so every quad stays fully active and the test exercises legitimate predicated quad operations. When subgroupSize is 4, the only quad gets bisected, leaving no fully active quad — calling quadBroadcast/quadSwap there is undefined behavior and the result is not meaningful to validate.

GPUAdapterInfo.subgroupMinSize is not a sufficient pre-check: the size the implementation actually selects depends on the compiled shader (workgroup shape, register pressure, vendor heuristics), not just the adapter's minimum. Two implementations from the same vendor can report identical {min,max} and pick different runtime sizes for the same shader. Reading @Builtin(subgroup_size) from inside the test shader is the only reliable signal.

Issue: #4650


Requirements for PR author:

  • All missing test coverage is tracked with "TODO" or .unimplemented().
  • New helpers are /** documented */ and new helper files are found in helper_index.txt.
  • Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)
  • Test have be tested with compatibility mode validation enabled and behave as expected. (If not passing, explain above.)

Requirements for reviewer sign-off:

  • Tests are properly located.
  • Test descriptions are accurate and complete.
  • Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
  • Tests avoid over-parameterization (see case count report).

When landing this PR, be sure to make any necessary issue status updates.

// 4-thread workgroup (as WARP does). With subgroup size 4 and a
// split predicate, only 2 lanes remain active in the only quad,
// leaving no fully active quad (undefined behavior).
if (wgThreads === 4 && t.device.adapterInfo.subgroupMinSize <= 4) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth also skipping unconditionally if max subgroup size is 4, since that would split every quad as well? Though, WARP still only chooses native wave sizes without explicit requests to go wider, so on arm64 systems, I think it would pick subgroup size 4 since it only supports NEON (and even with SVE there's no real systems that support vector length > 128 bit).

So I think this might need to be a skip from inspecting the subgroup size as reported by running the shader itself, rather than a pre-emptive skip.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's simple enough as we already capture that data in the shader.

@kainino0x kainino0x requested a review from alan-baker June 2, 2026 16:42
When the implementation selects a subgroup size < 8 for the test's
workgroup, the split predicate `id < subgroupSize / 2` bisects the
only quad in the subgroup, leaving no fully active quad — which is
undefined behavior for quad operations. This is observed on WARP
(which selects its native D3D12 wave size: 4 on arm64 NEON, often 4
on x86 for small workgroups) and may occur on any implementation
that picks a small native subgroup size at runtime.

Two coordinated guards:

  * In the shader, the quad call is wrapped in
    `if subgroupSize >= 8u { ... }` so it never executes when the
    split predicate would be unsafe.
  * In the JS checker, the actual subgroupSize is read out of
    metadata.subgroup_size[0] and the test is skipped with t.skip
    when it is < 8, so the missing output doesn't get flagged as a
    failure.

Querying GPUAdapterInfo.subgroupMinSize would not be sufficient: the
size the implementation actually selects depends on the shader (its
workgroup size, register pressure, etc.), not just the adapter's
minimum supported size. Reading subgroupSize from inside the test
shader itself is the only reliable signal.
@gyagp gyagp force-pushed the cts-quadops-warp branch from 29aa5f0 to e1b63f3 Compare June 4, 2026 03:09
@gyagp gyagp changed the title Skip quadBroadcast/quadSwap split tests when subgroup splits the only quad Skip quadBroadcast/quadSwap split tests when subgroup_size < 8 Jun 4, 2026
@gyagp
Copy link
Copy Markdown
Author

gyagp commented Jun 4, 2026

I changed to use the reported subgroup size instead. Below are the test results on various situations.

@workgroup_size NVIDIA RTX 5080 WARP 1.0.20 WARP 10.0.26100.8246 (Win11 24H2 built-in)
[4,1,1] 32 4 4
[8,1,1] 32 8 4
[16,1,1] 32 16 4
[32,1,1] 32 16 4
[64,1,1] 32 16 4
[256,1,1] 32 16 4
[4,4,4] 32 16 4

All three report adapter.subgroupMinSize=4 (WARP) or 32 (NVIDIA), but the runtime size for the same shader differs across:
* NVIDIA: always 32 — no skip needed.
* WARP 1.0.20: min(wgThreads, 16) — skip only the subgroup size is 4 (6 of the 46 compute,split cases).
* WARP system Windows 11 24H2: always 4 — every case skips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants