JIT: Rewrite BlendVariableMask when mask is created from vector#126062
Open
saucecontrol wants to merge 1 commit intodotnet:mainfrom
Open
JIT: Rewrite BlendVariableMask when mask is created from vector#126062saucecontrol wants to merge 1 commit intodotnet:mainfrom
saucecontrol wants to merge 1 commit intodotnet:mainfrom
Conversation
Contributor
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates JIT rationalization for xarch NI_AVX512_BlendVariableMask to avoid keeping the mask-form blend when the mask operand originates from a vector-to-mask conversion, preventing a deoptimization where a mask is created only to be embedded.
Changes:
- Extend
RewriteHWIntrinsicBlendvto detect when the blend mask is produced viaNI_AVX512_ConvertVectorToMask. - Avoid the “keep embedded mask” early-return in that scenario so the blend can be rewritten back to the non-mask form.
Comments suppressed due to low confidence (1)
src/coreclr/jit/rationalize.cpp:685
op3is now a localGenTree*, but it is later passed by address toRewriteHWIntrinsicToNonMask(&op3, ...).RewriteHWIntrinsicToNonMaskexpectsuseto be the actual operand edge so it can replace/remove nodes (e.g., it removesNI_AVX512_ConvertVectorToMaskand updates the parent viaReplaceOperand). Passing a local pointer meansnode->Op(3)will not be updated, leaving the blend node still pointing at the removed intrinsic (dangling operand / miscompile). UseGenTree*& op3 = node->Op(3);(or otherwise pass&node->Op(3)/ an operand reference) when callingRewriteHWIntrinsicToNonMask.
GenTree* op2 = node->Op(2);
GenTree* op3 = node->Op(3);
// We're in the post-order visit and are traversing in execution order, so
// everything between op2 and node will have already been rewritten to LIR
// form and doing the IsInvariantInRange check is safe. This allows us to
// catch cases where something is embedded masking compatible but where we
// could never actually contain it and so we want to rewrite it to the non-mask
// variant
SideEffectSet scratchSideEffects;
if (scratchSideEffects.IsLirInvariantInRange(m_compiler, op2, node))
{
unsigned tgtMaskSize = simdSize / genTypeSize(simdBaseType);
var_types tgtSimdBaseType = TYP_UNDEF;
if (op2->isEmbeddedMaskingCompatible(m_compiler, tgtMaskSize, tgtSimdBaseType))
{
// Make sure we had a mask to begin with. We don't want to create a mask
// solely for the purpose of embedding it.
if (!op3->OperIsHWIntrinsic() ||
(op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask))
{
// We are going to utilize the embedded mask, so we don't need to rewrite. However,
// we want to fixup the simdBaseType here since it simplifies lowering and allows
// both embedded broadcast and the mask to be live simultaneously.
if (tgtSimdBaseType != TYP_UNDEF)
{
op2->AsHWIntrinsic()->SetSimdBaseType(tgtSimdBaseType);
}
return;
}
}
}
if (!ShouldRewriteToNonMaskHWIntrinsic(op3))
{
return;
}
parents.Push(op3);
RewriteHWIntrinsicToNonMask(&op3, parents);
(void)parents.Pop();
Member
Author
|
cc @dotnet/jit-contrib SPMI doesn't show any diffs, but this does fix up my motivating case. Something like: static Vector128<float> AddToNegative(Vector128<float> v1, Vector128<float> v2)
=> Sse41.BlendVariable(v1, v1 + v2, v1); vmovups xmm0, xmmword ptr [rdx]
- vpmovd2m k1, xmm0
- vaddps xmm0 {k1}, xmm0, xmmword ptr [r8]
+ vaddps xmm1, xmm0, xmmword ptr [r8]
+ vblendvps xmm0, xmm0, xmm1, xmm0
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
ret
-; Total bytes of code: 24
+; Total bytes of code: 23 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This catches cases where
BlendVariableis 'upgraded' toBlendVariableMaskon import but the blend mask was notTYP_MASK.There is logic in place that checks whether the blend could be used as an EVEX embedded mask and rewrites back to
BlendVariableif not. However, it misses cases where the mask is created from a vector anyway, and creating the mask just to embed it is a deoptimization.