-
Notifications
You must be signed in to change notification settings - Fork 15.5k
Fix/172104 clang cl simd intrinsics #172116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix/172104 clang cl simd intrinsics #172116
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-flang-fir-hlfir Author: Priyanshu Singh (dev-priyanshu15) ChangesFixes Issue #172104 clang-cl was unable to compile MSVC SIMD intrinsics like _mm_mullo_epi32 This change enables SSE4.1 features by default for both 32-bit x86 and x86_64 Test coverage added for:
Full diff: https://github.com/llvm/llvm-project/pull/172116.diff 8 Files Affected:
diff --git a/clang/lib/Basic/Targets/X86.cpp b/clang/lib/Basic/Targets/X86.cpp
index f00d435937b92..6a3c12c28b0b8 100644
--- a/clang/lib/Basic/Targets/X86.cpp
+++ b/clang/lib/Basic/Targets/X86.cpp
@@ -153,9 +153,13 @@ bool X86TargetInfo::initFeatureMap(
llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef CPU,
const std::vector<std::string> &FeaturesVec) const {
// FIXME: This *really* should not be here.
- // X86_64 always has SSE2.
- if (getTriple().getArch() == llvm::Triple::x86_64)
+ // X86_64 always has SSE2 and SSE4.1 to support common SIMD intrinsics.
+ // Also enable these for 32-bit x86 to ensure MSVC-compatible SIMD support.
+ if (getTriple().getArch() == llvm::Triple::x86_64 ||
+ getTriple().getArch() == llvm::Triple::x86) {
setFeatureEnabled(Features, "sse2", true);
+ setFeatureEnabled(Features, "sse4.1", true);
+ }
using namespace llvm::X86;
diff --git a/clang/test/Index/auto-function-param.cpp b/clang/test/Index/auto-function-param.cpp
new file mode 100644
index 0000000000000..5366d8468007e
--- /dev/null
+++ b/clang/test/Index/auto-function-param.cpp
@@ -0,0 +1,14 @@
+// Test case for auto function parameter reported as CXType_Auto
+// This test verifies that auto parameters in function declarations
+// are properly reported as CXType_Auto in the libclang C API
+// See issue #172072
+
+// RUN: c-index-test -test-type %s | FileCheck %s
+
+// Function with auto parameter
+int bar(auto p) {
+ return p;
+}
+
+// CHECK: FunctionDecl=bar:{{.*}} CXType_FunctionProto
+// CHECK: ParmDecl=p:{{.*}} CXType_Auto
diff --git a/clang/test/Sema/simd-intrinsic-sse41-default.c b/clang/test/Sema/simd-intrinsic-sse41-default.c
new file mode 100644
index 0000000000000..aa5f4729da8d0
--- /dev/null
+++ b/clang/test/Sema/simd-intrinsic-sse41-default.c
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -x c -triple i386-pc-windows-msvc -fsyntax-only %s
+// RUN: %clang_cc1 -x c -triple x86_64-pc-windows-msvc -fsyntax-only %s
+// RUN: %clang_cc1 -x c -triple i386-unknown-linux-gnu -fsyntax-only %s
+// RUN: %clang_cc1 -x c -triple x86_64-unknown-linux-gnu -fsyntax-only %s
+
+// This test verifies that SSE4.1 intrinsics are available by default
+// without requiring explicit /arch: or -msse4.1 flags.
+
+#include <immintrin.h>
+
+void test_sse41_intrinsics(void) {
+ __m128i a = _mm_set1_epi32(1);
+ __m128i b = _mm_set1_epi32(2);
+ __m128i result = _mm_mullo_epi32(a, b);
+ (void)result;
+}
diff --git a/clang/tools/libclang/CIndex.cpp b/clang/tools/libclang/CIndex.cpp
index 32e84248c1b27..bb0816b0447a9 100644
--- a/clang/tools/libclang/CIndex.cpp
+++ b/clang/tools/libclang/CIndex.cpp
@@ -1789,6 +1789,15 @@ bool CursorVisitor::VisitAdjustedTypeLoc(AdjustedTypeLoc TL) {
return Visit(TL.getOriginalLoc());
}
+bool CursorVisitor::VisitAutoTypeLoc(AutoTypeLoc TL) {
+ // AutoTypeLoc represents the location of an auto type specifier.
+ // We do not visit children because the auto type itself is complete.
+ // This handler ensures that auto function parameters are properly
+ // reported as CXType_Auto in the libclang C API, rather than being
+ // incorrectly reported as TypeRef/unexposed.
+ return false;
+}
+
bool CursorVisitor::VisitDeducedTemplateSpecializationTypeLoc(
DeducedTemplateSpecializationTypeLoc TL) {
if (VisitTemplateName(TL.getTypePtr()->getTemplateName(),
diff --git a/flang/lib/Optimizer/Transforms/AddAliasTags.cpp b/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
index 3718848c05775..a0b10d1858a92 100644
--- a/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
+++ b/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
@@ -702,20 +702,24 @@ void AddAliasTagsPass::runOnAliasInterface(fir::FirAliasTagOpInterface op,
source.kind == fir::AliasAnalysis::SourceKind::Argument) {
LLVM_DEBUG(llvm::dbgs().indent(2)
<< "Found reference to dummy argument at " << *op << "\n");
- std::string name = getFuncArgName(llvm::cast<mlir::Value>(source.origin.u));
// POINTERS can alias with any POINTER or TARGET. Assume that TARGET dummy
// arguments might alias with each other (because of the "TARGET" hole for
// dummy arguments). See flang/docs/Aliasing.md.
+ // If it is a TARGET or POINTER, then we do not care about the name,
+ // because the tag points to the root of the subtree currently.
if (source.isTargetOrPointer()) {
tag = state.getFuncTreeWithScope(func, scopeOp).targetDataTree.getTag();
- } else if (!name.empty()) {
- tag = state.getFuncTreeWithScope(func, scopeOp)
- .dummyArgDataTree.getTag(name);
} else {
- LLVM_DEBUG(llvm::dbgs().indent(2)
- << "WARN: couldn't find a name for dummy argument " << *op
- << "\n");
- tag = state.getFuncTreeWithScope(func, scopeOp).dummyArgDataTree.getTag();
+ std::string name = getFuncArgName(llvm::cast<mlir::Value>(source.origin.u));
+ if (!name.empty()) {
+ tag = state.getFuncTreeWithScope(func, scopeOp)
+ .dummyArgDataTree.getTag(name);
+ } else {
+ LLVM_DEBUG(llvm::dbgs().indent(2)
+ << "WARN: couldn't find a name for dummy argument " << *op
+ << "\n");
+ tag = state.getFuncTreeWithScope(func, scopeOp).dummyArgDataTree.getTag();
+ }
}
// TBAA for global variables without descriptors
diff --git a/flang/test/Transforms/alias-tags-master-private-target.f90 b/flang/test/Transforms/alias-tags-master-private-target.f90
new file mode 100644
index 0000000000000..0251376476c64
--- /dev/null
+++ b/flang/test/Transforms/alias-tags-master-private-target.f90
@@ -0,0 +1,26 @@
+! Test case for regression in OpenMP master region with private target integer
+! This test was failing with an assertion error in AddAliasTags.cpp
+! See issue #172075
+
+! RUN: %flang -fopenmp -c -O1 %s -o %t.o 2>&1 | FileCheck %s --allow-empty
+
+module test
+contains
+subroutine omp_master_repro()
+ implicit none
+ integer, parameter :: nim = 4
+ integer, parameter :: nvals = 8
+ integer, target :: ui
+ integer :: hold1(nvals, nim)
+ hold1 = 0
+ !$OMP PARALLEL DEFAULT(NONE) &
+ !$OMP PRIVATE(ui) &
+ !$OMP SHARED(hold1, nim)
+ !$OMP MASTER
+ do ui = 1, nim
+ hold1(:, ui) = 1
+ end do
+ !$OMP END MASTER
+ !$OMP END PARALLEL
+end subroutine omp_master_repro
+end module test
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index d48b34ee0ecf5..87e682d57a568 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2907,6 +2907,10 @@ uint64_t LoopVectorizationCostModel::getPredBlockCostDivisor(
uint64_t BBFreq = getBFI().getBlockFreq(BB).getFrequency();
assert(HeaderFreq >= BBFreq &&
"Header has smaller block freq than dominated BB?");
+ // Guard against division by zero when BBFreq is 0.
+ // In such cases, return 1 to avoid undefined behavior.
+ if (BBFreq == 0)
+ return 1;
return std::round((double)HeaderFreq / BBFreq);
}
diff --git a/llvm/test/Transforms/LoopVectorize/crash-sigfpe-zero-freq.ll b/llvm/test/Transforms/LoopVectorize/crash-sigfpe-zero-freq.ll
new file mode 100644
index 0000000000000..3b6f3b67d23c5
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/crash-sigfpe-zero-freq.ll
@@ -0,0 +1,39 @@
+; Test case for crash with Floating point Exception in loop-vectorize pass
+; This test verifies that the loop vectorizer does not crash with SIGFPE
+; when processing blocks with zero block frequency.
+; See issue #172049
+
+; RUN: opt -passes=loop-vectorize -S %s
+
+; ModuleID = 'reduced.ll'
+source_filename = "reduced.ll"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
+target triple = "x86_64-unknown-linux-gnu"
+
+define ptr addrspace(1) @wombat() gc "statepoint-example" {
+bb:
+ br label %bb2
+
+bb1:
+ ret ptr addrspace(1) null
+
+bb2:
+ %phi = phi i64 [ %add, %bb6 ], [ 0, %bb ]
+ br i1 false, label %bb3, label %bb6
+
+bb3:
+ br i1 false, label %bb4, label %bb5, !prof !0
+
+bb4:
+ br label %bb6
+
+bb5:
+ br label %bb6
+
+bb6:
+ %add = add i64 %phi, 1
+ %icmp = icmp eq i64 %phi, 0
+ br i1 %icmp, label %bb2, label %bb1
+}
+
+!0 = !{!"branch_weights", i32 1, i32 0}
|
|
@llvm/pr-subscribers-vectorizers Author: Priyanshu Singh (dev-priyanshu15) ChangesFixes Issue #172104 clang-cl was unable to compile MSVC SIMD intrinsics like _mm_mullo_epi32 This change enables SSE4.1 features by default for both 32-bit x86 and x86_64 Test coverage added for:
Full diff: https://github.com/llvm/llvm-project/pull/172116.diff 8 Files Affected:
diff --git a/clang/lib/Basic/Targets/X86.cpp b/clang/lib/Basic/Targets/X86.cpp
index f00d435937b92..6a3c12c28b0b8 100644
--- a/clang/lib/Basic/Targets/X86.cpp
+++ b/clang/lib/Basic/Targets/X86.cpp
@@ -153,9 +153,13 @@ bool X86TargetInfo::initFeatureMap(
llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef CPU,
const std::vector<std::string> &FeaturesVec) const {
// FIXME: This *really* should not be here.
- // X86_64 always has SSE2.
- if (getTriple().getArch() == llvm::Triple::x86_64)
+ // X86_64 always has SSE2 and SSE4.1 to support common SIMD intrinsics.
+ // Also enable these for 32-bit x86 to ensure MSVC-compatible SIMD support.
+ if (getTriple().getArch() == llvm::Triple::x86_64 ||
+ getTriple().getArch() == llvm::Triple::x86) {
setFeatureEnabled(Features, "sse2", true);
+ setFeatureEnabled(Features, "sse4.1", true);
+ }
using namespace llvm::X86;
diff --git a/clang/test/Index/auto-function-param.cpp b/clang/test/Index/auto-function-param.cpp
new file mode 100644
index 0000000000000..5366d8468007e
--- /dev/null
+++ b/clang/test/Index/auto-function-param.cpp
@@ -0,0 +1,14 @@
+// Test case for auto function parameter reported as CXType_Auto
+// This test verifies that auto parameters in function declarations
+// are properly reported as CXType_Auto in the libclang C API
+// See issue #172072
+
+// RUN: c-index-test -test-type %s | FileCheck %s
+
+// Function with auto parameter
+int bar(auto p) {
+ return p;
+}
+
+// CHECK: FunctionDecl=bar:{{.*}} CXType_FunctionProto
+// CHECK: ParmDecl=p:{{.*}} CXType_Auto
diff --git a/clang/test/Sema/simd-intrinsic-sse41-default.c b/clang/test/Sema/simd-intrinsic-sse41-default.c
new file mode 100644
index 0000000000000..aa5f4729da8d0
--- /dev/null
+++ b/clang/test/Sema/simd-intrinsic-sse41-default.c
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -x c -triple i386-pc-windows-msvc -fsyntax-only %s
+// RUN: %clang_cc1 -x c -triple x86_64-pc-windows-msvc -fsyntax-only %s
+// RUN: %clang_cc1 -x c -triple i386-unknown-linux-gnu -fsyntax-only %s
+// RUN: %clang_cc1 -x c -triple x86_64-unknown-linux-gnu -fsyntax-only %s
+
+// This test verifies that SSE4.1 intrinsics are available by default
+// without requiring explicit /arch: or -msse4.1 flags.
+
+#include <immintrin.h>
+
+void test_sse41_intrinsics(void) {
+ __m128i a = _mm_set1_epi32(1);
+ __m128i b = _mm_set1_epi32(2);
+ __m128i result = _mm_mullo_epi32(a, b);
+ (void)result;
+}
diff --git a/clang/tools/libclang/CIndex.cpp b/clang/tools/libclang/CIndex.cpp
index 32e84248c1b27..bb0816b0447a9 100644
--- a/clang/tools/libclang/CIndex.cpp
+++ b/clang/tools/libclang/CIndex.cpp
@@ -1789,6 +1789,15 @@ bool CursorVisitor::VisitAdjustedTypeLoc(AdjustedTypeLoc TL) {
return Visit(TL.getOriginalLoc());
}
+bool CursorVisitor::VisitAutoTypeLoc(AutoTypeLoc TL) {
+ // AutoTypeLoc represents the location of an auto type specifier.
+ // We do not visit children because the auto type itself is complete.
+ // This handler ensures that auto function parameters are properly
+ // reported as CXType_Auto in the libclang C API, rather than being
+ // incorrectly reported as TypeRef/unexposed.
+ return false;
+}
+
bool CursorVisitor::VisitDeducedTemplateSpecializationTypeLoc(
DeducedTemplateSpecializationTypeLoc TL) {
if (VisitTemplateName(TL.getTypePtr()->getTemplateName(),
diff --git a/flang/lib/Optimizer/Transforms/AddAliasTags.cpp b/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
index 3718848c05775..a0b10d1858a92 100644
--- a/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
+++ b/flang/lib/Optimizer/Transforms/AddAliasTags.cpp
@@ -702,20 +702,24 @@ void AddAliasTagsPass::runOnAliasInterface(fir::FirAliasTagOpInterface op,
source.kind == fir::AliasAnalysis::SourceKind::Argument) {
LLVM_DEBUG(llvm::dbgs().indent(2)
<< "Found reference to dummy argument at " << *op << "\n");
- std::string name = getFuncArgName(llvm::cast<mlir::Value>(source.origin.u));
// POINTERS can alias with any POINTER or TARGET. Assume that TARGET dummy
// arguments might alias with each other (because of the "TARGET" hole for
// dummy arguments). See flang/docs/Aliasing.md.
+ // If it is a TARGET or POINTER, then we do not care about the name,
+ // because the tag points to the root of the subtree currently.
if (source.isTargetOrPointer()) {
tag = state.getFuncTreeWithScope(func, scopeOp).targetDataTree.getTag();
- } else if (!name.empty()) {
- tag = state.getFuncTreeWithScope(func, scopeOp)
- .dummyArgDataTree.getTag(name);
} else {
- LLVM_DEBUG(llvm::dbgs().indent(2)
- << "WARN: couldn't find a name for dummy argument " << *op
- << "\n");
- tag = state.getFuncTreeWithScope(func, scopeOp).dummyArgDataTree.getTag();
+ std::string name = getFuncArgName(llvm::cast<mlir::Value>(source.origin.u));
+ if (!name.empty()) {
+ tag = state.getFuncTreeWithScope(func, scopeOp)
+ .dummyArgDataTree.getTag(name);
+ } else {
+ LLVM_DEBUG(llvm::dbgs().indent(2)
+ << "WARN: couldn't find a name for dummy argument " << *op
+ << "\n");
+ tag = state.getFuncTreeWithScope(func, scopeOp).dummyArgDataTree.getTag();
+ }
}
// TBAA for global variables without descriptors
diff --git a/flang/test/Transforms/alias-tags-master-private-target.f90 b/flang/test/Transforms/alias-tags-master-private-target.f90
new file mode 100644
index 0000000000000..0251376476c64
--- /dev/null
+++ b/flang/test/Transforms/alias-tags-master-private-target.f90
@@ -0,0 +1,26 @@
+! Test case for regression in OpenMP master region with private target integer
+! This test was failing with an assertion error in AddAliasTags.cpp
+! See issue #172075
+
+! RUN: %flang -fopenmp -c -O1 %s -o %t.o 2>&1 | FileCheck %s --allow-empty
+
+module test
+contains
+subroutine omp_master_repro()
+ implicit none
+ integer, parameter :: nim = 4
+ integer, parameter :: nvals = 8
+ integer, target :: ui
+ integer :: hold1(nvals, nim)
+ hold1 = 0
+ !$OMP PARALLEL DEFAULT(NONE) &
+ !$OMP PRIVATE(ui) &
+ !$OMP SHARED(hold1, nim)
+ !$OMP MASTER
+ do ui = 1, nim
+ hold1(:, ui) = 1
+ end do
+ !$OMP END MASTER
+ !$OMP END PARALLEL
+end subroutine omp_master_repro
+end module test
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index d48b34ee0ecf5..87e682d57a568 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2907,6 +2907,10 @@ uint64_t LoopVectorizationCostModel::getPredBlockCostDivisor(
uint64_t BBFreq = getBFI().getBlockFreq(BB).getFrequency();
assert(HeaderFreq >= BBFreq &&
"Header has smaller block freq than dominated BB?");
+ // Guard against division by zero when BBFreq is 0.
+ // In such cases, return 1 to avoid undefined behavior.
+ if (BBFreq == 0)
+ return 1;
return std::round((double)HeaderFreq / BBFreq);
}
diff --git a/llvm/test/Transforms/LoopVectorize/crash-sigfpe-zero-freq.ll b/llvm/test/Transforms/LoopVectorize/crash-sigfpe-zero-freq.ll
new file mode 100644
index 0000000000000..3b6f3b67d23c5
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/crash-sigfpe-zero-freq.ll
@@ -0,0 +1,39 @@
+; Test case for crash with Floating point Exception in loop-vectorize pass
+; This test verifies that the loop vectorizer does not crash with SIGFPE
+; when processing blocks with zero block frequency.
+; See issue #172049
+
+; RUN: opt -passes=loop-vectorize -S %s
+
+; ModuleID = 'reduced.ll'
+source_filename = "reduced.ll"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
+target triple = "x86_64-unknown-linux-gnu"
+
+define ptr addrspace(1) @wombat() gc "statepoint-example" {
+bb:
+ br label %bb2
+
+bb1:
+ ret ptr addrspace(1) null
+
+bb2:
+ %phi = phi i64 [ %add, %bb6 ], [ 0, %bb ]
+ br i1 false, label %bb3, label %bb6
+
+bb3:
+ br i1 false, label %bb4, label %bb5, !prof !0
+
+bb4:
+ br label %bb6
+
+bb5:
+ br label %bb6
+
+bb6:
+ %add = add i64 %phi, 1
+ %icmp = icmp eq i64 %phi, 0
+ br i1 %icmp, label %bb2, label %bb1
+}
+
+!0 = !{!"branch_weights", i32 1, i32 0}
|
clementval
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flang changes are out of scope of this PR.
5c6c863 to
b42e14e
Compare
f0706a9 to
c688d64
Compare
Fixes Issue #172104
clang-cl was unable to compile MSVC SIMD intrinsics like _mm_mullo_epi32
without explicit /arch: flags because SSE4.1 was not enabled by default.
This change enables SSE4.1 features by default for both 32-bit x86 and x86_64
architectures, ensuring SIMD intrinsics from immintrin.h are accessible without
requiring explicit architecture flags.
Test coverage added for: