Conversation
tex3d
left a comment
There was a problem hiding this comment.
Looks fine, but had a suggestion to make the constant case less of a no-op test, and suggest trying 3 active threads for a non-constant scalar.
| Out4[TID.x] = WaveActiveBitXor(V); | ||
|
|
||
| // constant folding uint4 | ||
| Out5[TID.x] = WaveActiveBitXor(uint4(1,2,3,4)); |
There was a problem hiding this comment.
Even number of identical values XOR'd together will always be zero, and expecting all zero doesn't really test much.
There are also no tests that include an odd number of active lanes, and ensure that a bit set in each lane is set after that.
You could do something more interesting, like:
| Out5[TID.x] = WaveActiveBitXor(uint4(1,2,3,4)); | |
| Out5[TID.x] = WaveActiveBitXor(uint4(1,2,3,4)); | |
| if (TID.x != 1) | |
| Out5[TID.x + 4] = WaveActiveBitXor(uint4(1,2,3,4)); | |
| if (TID.x % 2) | |
| Out5[TID.x + 4 * 2] = WaveActiveBitXor(uint4(1,2,3,4)); | |
| if (TID.x == 1) | |
| Out5[4 * 3] = WaveActiveBitXor(uint4(1,2,3,4)); |
Which should result in:
Data: [
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, // 4 threads (0,1,2,3)
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x1, 0x2, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, // 3 threads (0,2,3)
0x1, 0x2, 0x3, 0x4, 0x1, 0x2, 0x3, 0x4,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, // 2 threads (0,2)
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x1, 0x2, 0x3, 0x4 // 1 thread (1)
]It would also be more interesting if you did something like the 3-thread case with the non-constant inputs (it could be just scalar at that point).
bogner
left a comment
There was a problem hiding this comment.
Tex's suggestions for slightly better test coverage are good, otherwise LGTM
This PR adds tests for WaveActiveBitXor.
WaveActiveBitXor only accepts uint and uint64 types.
The PR also adds a control flow test, since this operation is convergent.
Fixes #896