Skip to content

JIT: Rewrite BlendVariableMask when mask is created from vector#126062

Open
saucecontrol wants to merge 1 commit intodotnet:mainfrom
saucecontrol:less-mask
Open

JIT: Rewrite BlendVariableMask when mask is created from vector#126062
saucecontrol wants to merge 1 commit intodotnet:mainfrom
saucecontrol:less-mask

Conversation

@saucecontrol
Copy link
Copy Markdown
Member

@saucecontrol saucecontrol commented Mar 24, 2026

This catches cases where BlendVariable is 'upgraded' to BlendVariableMask on import but the blend mask was not TYP_MASK.

There is logic in place that checks whether the blend could be used as an EVEX embedded mask and rewrites back to BlendVariable if not. However, it misses cases where the mask is created from a vector anyway, and creating the mask just to embed it is a deoptimization.

Copilot AI review requested due to automatic review settings March 24, 2026 23:11
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 24, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 24, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates JIT rationalization for xarch NI_AVX512_BlendVariableMask to avoid keeping the mask-form blend when the mask operand originates from a vector-to-mask conversion, preventing a deoptimization where a mask is created only to be embedded.

Changes:

  • Extend RewriteHWIntrinsicBlendv to detect when the blend mask is produced via NI_AVX512_ConvertVectorToMask.
  • Avoid the “keep embedded mask” early-return in that scenario so the blend can be rewritten back to the non-mask form.
Comments suppressed due to low confidence (1)

src/coreclr/jit/rationalize.cpp:685

  • op3 is now a local GenTree*, but it is later passed by address to RewriteHWIntrinsicToNonMask(&op3, ...). RewriteHWIntrinsicToNonMask expects use to be the actual operand edge so it can replace/remove nodes (e.g., it removes NI_AVX512_ConvertVectorToMask and updates the parent via ReplaceOperand). Passing a local pointer means node->Op(3) will not be updated, leaving the blend node still pointing at the removed intrinsic (dangling operand / miscompile). Use GenTree*& op3 = node->Op(3); (or otherwise pass &node->Op(3) / an operand reference) when calling RewriteHWIntrinsicToNonMask.
    GenTree* op2 = node->Op(2);
    GenTree* op3 = node->Op(3);

    // We're in the post-order visit and are traversing in execution order, so
    // everything between op2 and node will have already been rewritten to LIR
    // form and doing the IsInvariantInRange check is safe. This allows us to
    // catch cases where something is embedded masking compatible but where we
    // could never actually contain it and so we want to rewrite it to the non-mask
    // variant
    SideEffectSet scratchSideEffects;

    if (scratchSideEffects.IsLirInvariantInRange(m_compiler, op2, node))
    {
        unsigned  tgtMaskSize     = simdSize / genTypeSize(simdBaseType);
        var_types tgtSimdBaseType = TYP_UNDEF;

        if (op2->isEmbeddedMaskingCompatible(m_compiler, tgtMaskSize, tgtSimdBaseType))
        {
            // Make sure we had a mask to begin with. We don't want to create a mask
            // solely for the purpose of embedding it.

            if (!op3->OperIsHWIntrinsic() ||
                (op3->AsHWIntrinsic()->GetHWIntrinsicId() != NI_AVX512_ConvertVectorToMask))
            {
                // We are going to utilize the embedded mask, so we don't need to rewrite. However,
                // we want to fixup the simdBaseType here since it simplifies lowering and allows
                // both embedded broadcast and the mask to be live simultaneously.

                if (tgtSimdBaseType != TYP_UNDEF)
                {
                    op2->AsHWIntrinsic()->SetSimdBaseType(tgtSimdBaseType);
                }
                return;
            }
        }
    }

    if (!ShouldRewriteToNonMaskHWIntrinsic(op3))
    {
        return;
    }

    parents.Push(op3);
    RewriteHWIntrinsicToNonMask(&op3, parents);
    (void)parents.Pop();

@saucecontrol
Copy link
Copy Markdown
Member Author

cc @dotnet/jit-contrib

SPMI doesn't show any diffs, but this does fix up my motivating case. Something like:

static Vector128<float> AddToNegative(Vector128<float> v1, Vector128<float> v2)
    => Sse41.BlendVariable(v1, v1 + v2, v1);
    vmovups  xmm0, xmmword ptr [rdx]
-   vpmovd2m k1, xmm0
-   vaddps   xmm0 {k1}, xmm0, xmmword ptr [r8]
+   vaddps   xmm1, xmm0, xmmword ptr [r8]
+   vblendvps xmm0, xmm0, xmm1, xmm0
    vmovups  xmmword ptr [rcx], xmm0
    mov      rax, rcx
    ret      
-; Total bytes of code: 24
+; Total bytes of code: 23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants