-
Notifications
You must be signed in to change notification settings - Fork 5.4k
JIT: Rewrite BlendVariableMask when mask is created from vector #126062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -638,7 +638,8 @@ void Rationalizer::RewriteHWIntrinsicBlendv(GenTree** use, Compiler::GenTreeStac | |||||||
| return; | ||||||||
| } | ||||||||
|
|
||||||||
| GenTree* op2 = node->Op(2); | ||||||||
| GenTree* op2 = node->Op(2); | ||||||||
| GenTree*& op3 = node->Op(3); | ||||||||
|
|
||||||||
| // We're in the post-order visit and are traversing in execution order, so | ||||||||
| // everything between op2 and node will have already been rewritten to LIR | ||||||||
|
|
@@ -655,20 +656,25 @@ void Rationalizer::RewriteHWIntrinsicBlendv(GenTree** use, Compiler::GenTreeStac | |||||||
|
|
||||||||
| if (op2->isEmbeddedMaskingCompatible(m_compiler, tgtMaskSize, tgtSimdBaseType)) | ||||||||
| { | ||||||||
| // We are going to utilize the embedded mask, so we don't need to rewrite. However, | ||||||||
| // we want to fixup the simdBaseType here since it simplifies lowering and allows | ||||||||
| // both embedded broadcast and the mask to be live simultaneously. | ||||||||
| // Make sure we had a mask to begin with. We don't want to create a mask | ||||||||
| // solely for the purpose of embedding it. | ||||||||
|
|
||||||||
| if (tgtSimdBaseType != TYP_UNDEF) | ||||||||
| if (!op3->OperIsHWIntrinsic() || | ||||||||
| (op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask)) | ||||||||
|
Comment on lines
+662
to
+663
|
||||||||
| if (!op3->OperIsHWIntrinsic() || | |
| (op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask)) | |
| if (!op3->OperIsConvertVectorToMask()) |
Copilot
AI
Apr 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes when BlendVariableMask is rewritten back to BlendVariable (and can eliminate a ConvertVectorToMask), but I couldn't find an existing JIT codegen test covering this scenario (no BlendVariableMask hits under src/tests). Consider adding an asm pattern test to prevent regressions for Vector128/256 cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs an elaboration covering why as it's operating under the presumption that
vpmov*2mis more expensive thanvblendvps, which can depend on the hardware and may change in the futureFor example, this is what we have today where it may be slower or may be similar perf (using
XMM/YMM/ZMMandvpmov*2mvsvblendvps):3/4/5vs13/4/6vs23vs1-23vs2-3Then there is also the case where we don't have a
vpblendvwandvpblendvbhas slightly different behavior here, so it might not be valid to switch back. Consider that0x8000selects a whole word but needs to be0x8080if usingvpblendvb.