Skip to content

Add SVE implementation of replace#6195

Open
hazzlim wants to merge 8 commits intomicrosoft:mainfrom
hazzlim:replace-sve-pr
Open

Add SVE implementation of replace#6195
hazzlim wants to merge 8 commits intomicrosoft:mainfrom
hazzlim:replace-sve-pr

Conversation

@hazzlim
Copy link
Copy Markdown
Contributor

@hazzlim hazzlim commented Mar 31, 2026

This PR adds an SVE implementation of replace. This algorithm was previously not vectorized using Neon, due to the absence of masked stores in the instruction set. See #4433 for why this is an issue.

Benchmark results ⏲️

Results are speedup values relative to the existing C code as a baseline - higher is better. Benchmark results were obtained running on a Neoverse N2 machine.

  MSVC Speedup Clang Speedup
r<std::uint8_t> 17.03 7.024
r<std::uint16_t> 10.17 3.767
r<std::uint32_t> 4.592 2.109
r<std::uint64_t> 2.475 1.23

@hazzlim hazzlim requested a review from a team as a code owner March 31, 2026 22:18
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Mar 31, 2026
Comment thread stl/inc/algorithm Outdated
@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture ARM64EC I can't believe it's not x64! labels Mar 31, 2026
@StephanTLavavej StephanTLavavej self-assigned this Apr 2, 2026
Comment thread stl/src/vector_algorithms.cpp Outdated
Comment thread stl/src/vector_algorithms.cpp Outdated
Comment thread stl/src/vector_algorithms.cpp Outdated
@StephanTLavavej StephanTLavavej removed their assignment Apr 3, 2026
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Apr 3, 2026
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Apr 15, 2026
@StephanTLavavej

This comment was marked as outdated.

@StephanTLavavej StephanTLavavej moved this from Merging to Blocked in STL Code Reviews Apr 16, 2026
@StephanTLavavej StephanTLavavej added the blocked Something is preventing work on this label Apr 16, 2026
@StephanTLavavej
Copy link
Copy Markdown
Member

Turns out I can't merge this until the MSVC-internal checked-in compiler is updated. The current 14.50 compiler can't understand the new <arm_sve.h> and ICEs quite horribly, and I don't see any possible workaround. (vector_algorithms.cpp has to be built by the checked-in compiler.) The good news is that we should only have to wait a month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM64EC I can't believe it's not x64! ARM64 Related to the ARM64 architecture blocked Something is preventing work on this performance Must go faster

Projects

Status: Blocked

Development

Successfully merging this pull request may close these issues.

3 participants