Skip quadBroadcast/quadSwap split tests when subgroup_size < 8 by gyagp · Pull Request #4656 · gpuweb/cts

gyagp · 2026-06-02T08:10:35Z

The compute,split tests use the predicate id < subgroupSize / 2, which deactivates the upper half of each subgroup. When subgroupSize is 8 or greater the boundary lands on a 4-lane multiple, so every quad stays fully active and the test exercises legitimate predicated quad operations. When subgroupSize is 4, the only quad gets bisected, leaving no fully active quad — calling quadBroadcast/quadSwap there is undefined behavior and the result is not meaningful to validate.

GPUAdapterInfo.subgroupMinSize is not a sufficient pre-check: the size the implementation actually selects depends on the compiled shader (workgroup shape, register pressure, vendor heuristics), not just the adapter's minimum. Two implementations from the same vendor can report identical {min,max} and pick different runtime sizes for the same shader. Reading @Builtin(subgroup_size) from inside the test shader is the only reliable signal.

Issue: #4650

Requirements for PR author:

~~All missing test coverage is tracked with "TODO" or .unimplemented().~~
~~New helpers are /** documented */ and new helper files are found in helper_index.txt.~~
Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)
~~Test have be tested with compatibility mode validation enabled and behave as expected. (If not passing, explain above.)~~

Requirements for reviewer sign-off:

Tests are properly located.
Test descriptions are accurate and complete.
Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
Tests avoid over-parameterization (see case count report).

When landing this PR, be sure to make any necessary issue status updates.

jenatali · 2026-06-02T15:46:32Z

+    // 4-thread workgroup (as WARP does). With subgroup size 4 and a
+    // split predicate, only 2 lanes remain active in the only quad,
+    // leaving no fully active quad (undefined behavior).
+    if (wgThreads === 4 && t.device.adapterInfo.subgroupMinSize <= 4) {


Worth also skipping unconditionally if max subgroup size is 4, since that would split every quad as well? Though, WARP still only chooses native wave sizes without explicit requests to go wider, so on arm64 systems, I think it would pick subgroup size 4 since it only supports NEON (and even with SVE there's no real systems that support vector length > 128 bit).

So I think this might need to be a skip from inspecting the subgroup size as reported by running the shader itself, rather than a pre-emptive skip.

That's simple enough as we already capture that data in the shader.

When the implementation selects a subgroup size < 8 for the test's workgroup, the split predicate `id < subgroupSize / 2` bisects the only quad in the subgroup, leaving no fully active quad — which is undefined behavior for quad operations. This is observed on WARP (which selects its native D3D12 wave size: 4 on arm64 NEON, often 4 on x86 for small workgroups) and may occur on any implementation that picks a small native subgroup size at runtime. Two coordinated guards: * In the shader, the quad call is wrapped in `if subgroupSize >= 8u { ... }` so it never executes when the split predicate would be unsafe. * In the JS checker, the actual subgroupSize is read out of metadata.subgroup_size[0] and the test is skipped with t.skip when it is < 8, so the missing output doesn't get flagged as a failure. Querying GPUAdapterInfo.subgroupMinSize would not be sufficient: the size the implementation actually selects depends on the shader (its workgroup size, register pressure, etc.), not just the adapter's minimum supported size. Reading subgroupSize from inside the test shader itself is the only reliable signal.

gyagp · 2026-06-04T03:29:18Z

I changed to use the reported subgroup size instead. Below are the test results on various situations.

@workgroup_size	NVIDIA RTX 5080	WARP 1.0.20	WARP 10.0.26100.8246 (Win11 24H2 built-in)
[4,1,1]	32	4	4
[8,1,1]	32	8	4
[16,1,1]	32	16	4
[32,1,1]	32	16	4
[64,1,1]	32	16	4
[256,1,1]	32	16	4
[4,4,4]	32	16	4

All three report adapter.subgroupMinSize=4 (WARP) or 32 (NVIDIA), but the runtime size for the same shader differs across:
* NVIDIA: always 32 — no skip needed.
* WARP 1.0.20: min(wgThreads, 16) — skip only the subgroup size is 4 (6 of the 46 compute,split cases).
* WARP system Windows 11 24H2: always 4 — every case skips.

jenatali reviewed Jun 2, 2026

View reviewed changes

kainino0x requested a review from alan-baker June 2, 2026 16:42

gyagp force-pushed the cts-quadops-warp branch from 29aa5f0 to e1b63f3 Compare June 4, 2026 03:09

gyagp changed the title ~~Skip quadBroadcast/quadSwap split tests when subgroup splits the only quad~~ Skip quadBroadcast/quadSwap split tests when subgroup_size < 8 Jun 4, 2026

Merge branch 'main' into cts-quadops-warp

d75eee4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip quadBroadcast/quadSwap split tests when subgroup_size < 8#4656

Skip quadBroadcast/quadSwap split tests when subgroup_size < 8#4656
gyagp wants to merge 2 commits into
gpuweb:mainfrom
gyagp:cts-quadops-warp

gyagp commented Jun 2, 2026 •

edited

Loading

Uh oh!

jenatali Jun 2, 2026

Uh oh!

alan-baker Jun 2, 2026

Uh oh!

gyagp commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gyagp commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jenatali Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

alan-baker Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gyagp commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gyagp commented Jun 2, 2026 •

edited

Loading