Improving avx/avx2 swizzles#1141
Conversation
serge-sans-paille
left a comment
There was a problem hiding this comment.
If you could reduce code duplication by syndicating patterns as suggested, that would be great!
|
If this passes the tests it can be merged. I hope I did not break any build this time. Should I add myself to the copyright? |
|
You can add yourself, yes. And rebase on master branch ;-) |
8239b5e to
cfb5838
Compare
|
I rebased. I did not squash as I have some commits that I would like to keep for reference in the branch. Feel free to squash and merge in master. |
|
Hey @DiamonDinoia : I'll have a look at this next week, don't worry if you don't have any feedback since then 🙇 |
ec97ae8 to
e99877d
Compare
e99877d to
c6b58b6
Compare
|
I am not sure what the problem is with PowerPC. It should not touch any code I wrote and from the print the results seems correct. |
|
How strange: the failure shows batches that... look the same /o\ |
|
I could not find an explanation for it. |
|
If you don't mind, I'll split your commit history in pieces and validate them step-by-step in order to understand what's happening - keeping your authorship information, of course. |
|
Sure, no problem, go ahead! |
c6b58b6 to
b285dbd
Compare
|
perfect, can you just squash your commits into a nice history? Thanks o/ |
Fixes: some swizzled did not allow duplicates in the output
0a906e5 to
8b1c9b9
Compare
|
Thanks for being patient and for the great patch set \o/ |
|
See ##1155 for the sse2 part |
My pleasure! |
| // The intrinsic does NOT allow to copy the same element of the source vector to more than one element of the destination vector. | ||
| // one-shot 8-lane permute |
There was a problem hiding this comment.
Was this tested concretely? According to the intel ISA reference for VPERMPS,
this instruction permits a doubleword in the source operand to be copied to more than one location in the destination operand
There was a problem hiding this comment.
It would be surprising if it was different from _mm256_permutevar8x32_epi32, in any case
There was a problem hiding this comment.
There was a webpage once but intel changed their website and I cannot find it anymore
There was a problem hiding this comment.
It is the same for _mm256_permutevar8x32_epi32 If I use that with duplicates I introduced a bug.
There was a problem hiding this comment.
There was a problem with duplicates originally, that's why I investigated. But I think it was showing up with sse
There was a problem hiding this comment.
(@AntoinePrv you could be interested in this discussion)
There was a problem hiding this comment.
Happy to revert the changes since ISA says that it is allowed. Would you like to open a PR or issue?
There was a problem hiding this comment.
I could try to, but I'm not a xsimd developer (I just encountered this by chance when grepping for xsimd's support for shuffling).
There was a problem hiding this comment.
I can do it once I have time otherwise.

Hi,
This PR optimzes swizzles where possible
Fixed a bug in the float avx2 swizzle which did not allow duplicates
Adds more tests