AVX swizzle broadcast and swap optimization by AntoinePrv · Pull Request #1213 · xtensor-stack/xsimd

AntoinePrv · 2025-11-18T14:10:30Z

Similar optimizations to #1201 but for AVX

serge-sans-paille · 2025-11-19T13:07:42Z

include/xsimd/arch/xsimd_avx.hpp

-            constexpr batch_bool_constant<uint32_t, A, (V0 >= 4), (V1 >= 4), (V2 >= 4), (V3 >= 4), (V4 >= 4), (V5 >= 4), (V6 >= 4), (V7 >= 4)> lane_mask {};
+            // select lane by the mask index divided by 4
+            constexpr auto lane = batch_constant<uint32_t, A, 0, 0, 0, 0, 1, 1, 1, 1> {};
+            constexpr int lane_idx = ((mask / make_batch_constant<uint32_t, 4, A>()) != lane).mask();


I have difficulties seeing how the former lane_mask = V_i >= 4 is equivalent to V_i / 4 != lane[i].

Why isn't that just lane_mask >= make_batch_constant<uint32_t, 4, A>() ?

Because r0 and r1 do not contain the same values as before:

before: r0 contains items from low in both lanes and r1 contains items from high in both lanes

after: each r0 lane contains items from its lane while each r1 lane contains items from the other lane.

For instance, before a 0 in the second lane must be selected from r0 (low values) while after it must be selected from r1 (other lane).

@serge-sans-paille is tis OK for you?

Because r0 and r1 do not contain the same values as before:

* before: `r0` contains items from low in both lanes and `r1` contains items from high in both lanes * after: each `r0` lane contains items from its lane while each `r1` lane contains items from the other lane.

For instance, before a 0 in the second lane must be selected from r0 (low values) while after it must be selected from r1 (other lane).

and this saves a few permute, perfect!

serge-sans-paille · 2025-11-19T13:08:57Z

include/xsimd/arch/xsimd_avx.hpp

-            constexpr batch_bool_constant<uint64_t, A, (V0 >= 2), (V1 >= 2), (V2 >= 2), (V3 >= 2)> blend_mask;
+            // select lane by the mask index divided by 2
+            constexpr auto lane = batch_constant<uint64_t, A, 0, 0, 1, 1> {};
+            constexpr int lane_idx = ((mask / make_batch_constant<uint64_t, 2, A>()) != lane).mask();


AntoinePrv added 5 commits November 18, 2025 15:08

Swap instead of duplicate

d889cdb

Add broadcast optimization

be3e8bd

Fix duplicate var

00ccfd3

Fix types

35e135e

Fix imm constant

c99e4ba

serge-sans-paille reviewed Nov 19, 2025

View reviewed changes

serge-sans-paille merged commit 6ebf925 into xtensor-stack:master Nov 20, 2025
60 checks passed

AntoinePrv deleted the swizzle-avx branch November 21, 2025 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVX swizzle broadcast and swap optimization#1213

AVX swizzle broadcast and swap optimization#1213
serge-sans-paille merged 5 commits intoxtensor-stack:masterfrom
AntoinePrv:swizzle-avx

AntoinePrv commented Nov 18, 2025 •

edited

Loading

Uh oh!

serge-sans-paille Nov 19, 2025

Uh oh!

AntoinePrv Nov 19, 2025

Uh oh!

AntoinePrv Nov 20, 2025

Uh oh!

serge-sans-paille Nov 20, 2025

Uh oh!

serge-sans-paille Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AntoinePrv commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serge-sans-paille Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

AntoinePrv Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

AntoinePrv Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

serge-sans-paille Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

serge-sans-paille Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AntoinePrv commented Nov 18, 2025 •

edited

Loading