Skip to content

Conversation

@xiver77
Copy link

@xiver77 xiver77 commented Jul 14, 2022

I had also sent you (the author) an email a while ago about the same issue.

The original code has a chain of shift -> xor -> shift, while the modified code has a chain of shift | shift -> xor. Two shifts can run in parallel.

Clang (14.0) does this optimization even with the original code, but GCC (12.1) doesn't, so it produces better code with manual optimization.

You can compare the machine code output (https://godbolt.org/z/5bMjdsYnf).

I had also sent you (the author) an email a while ago about the same issue.

The original code has a chain of shift -> xor -> shift, while the modified code has a chain of shift | shift -> xor. Two shifts can run in parallel.

Clang (14.0) does this optimization even with the original code, but GCC (12.1) doesn't, so it produces better code with manual optimization.

You can compare the machine code output (https://godbolt.org/z/5bMjdsYnf).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant