Skip to content

Narrow blanket SPIR-V loop unroll in optimizer recipes#2

Draft
AnastaZIuk wants to merge 2 commits intomainfrom
unroll
Draft

Narrow blanket SPIR-V loop unroll in optimizer recipes#2
AnastaZIuk wants to merge 2 commits intomainfrom
unroll

Conversation

@AnastaZIuk
Copy link
Member

@AnastaZIuk AnastaZIuk commented Mar 20, 2026

Summary

  • narrow legalization-time full loop unroll behind an explicit opt-in overload
  • narrow legalization-time SSA rewrite behind an explicit opt-in overload
  • stop materializing blanket full loop unroll in the default performance recipe
  • replace the two heavy global redundancy elimination passes in the default performance recipe with local redundancy elimination
  • remove several additional blanket cleanup steps from the legalization tail when the generic path does not need them
  • use the DXC trunk Godbolt reproducer, which is the preprocessed output of our path tracer at about 58k LoC

Root cause

The current SPIR-V optimizer recipes still carry two old blanket unroll decisions:

  • 9fbcce4ca17d added full loop unroll to legalization passes on 2018-09-19
  • 3c47dac28208 added full loop unroll to performance passes on 2020-05-20

On a large preprocessed HLSL payload with many small [unroll] loops this inflates the SPIR-V module far more than necessary and then pays for expensive cleanup over that self-inflated IR.

LoopControl::Unroll as an IR hint is not the problem. The expensive part is treating that hint as a blanket request to immediately materialize full unroll in the generic optimizer path even when legality does not require it.

A similar issue existed in the legalization tail. Some cleanup passes were effectively historical safety hammers rather than semantically required defaults. Narrowing them keeps the generic path correct while removing a large amount of unnecessary work.

DXC has the producer-side lowering context and knows when a specific HLSL pattern still requires materialized loop unroll or legalize-time SSA rewrite for correctness. The companion DXC patch in microsoft/DirectXShaderCompiler#8283 supplies that narrower signal, and its current branch head also materializes the companion SPIR-V submodule pointers.

Validation

  • reproducer: godbolt.org/z/o5xf1hq36 (note: Compiler Explorer cache can make repeated runs look much faster than a cold compile)
  • shader payload: preprocessed output of our path tracer at about 58k LoC
  • local machine: AMD Ryzen 5 5600G with Radeon Graphics, 6 physical cores, 12 logical processors, Windows-reported max clock 3901 MHz
  • on the same payload and the same machine, SPIRV-Tools@487ff843bd8a + DXC@bd9a8b1c5365 reduced the workload from 19.161 s to 6.042 s
  • with SPIRV-Tools@57007cf46bb4 + DXC@b02b772e0b50, the same payload measured 4.702 s
  • with the current branch pair SPIRV-Tools@7134be5024ff + DXC@55112e338fd2, the same payload now measures 2.464 s
  • full local CodeGenSPIRV lit/FileCheck passes with the companion DXC branch: 1403 expected passes, 2 expected failures, 0 unexpected

Companion DXC PR:
microsoft/DirectXShaderCompiler#8283

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant