-
Notifications
You must be signed in to change notification settings - Fork 15.6k
[CIR][X86] Implement convert_half builtins #171615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-clang Author: Priyanshu Kumar (Priyanshu3820) Changesrelated to #167765 Calls the LLVM intrinsic for the following builtins-
Full diff: https://github.com/llvm/llvm-project/pull/171615.diff 2 Files Affected:
diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp b/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
index fb17e31bf36d6..75022d4f93d4a 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
@@ -1514,12 +1514,40 @@ mlir::Value CIRGenFunction::emitX86BuiltinExpr(unsigned builtinID,
case X86::BI__builtin_ia32_cmpnltsd:
case X86::BI__builtin_ia32_cmpnlesd:
case X86::BI__builtin_ia32_cmpordsd:
- case X86::BI__builtin_ia32_vcvtph2ps_mask:
- case X86::BI__builtin_ia32_vcvtph2ps256_mask:
- case X86::BI__builtin_ia32_vcvtph2ps512_mask:
- case X86::BI__builtin_ia32_cvtneps2bf16_128_mask:
- case X86::BI__builtin_ia32_cvtneps2bf16_256_mask:
- case X86::BI__builtin_ia32_cvtneps2bf16_512_mask:
+ cgm.errorNYI(expr->getSourceRange(),
+ std::string("unimplemented X86 builtin call: ") +
+ getContext().BuiltinInfo.getName(builtinID));
+ return {};
+ case X86::BI__builtin_ia32_vcvtph2ps_mask: {
+ mlir::Location loc = getLoc(expr->getExprLoc());
+ return emitIntrinsicCallOp(builder, loc, "x86.avx512.mask.vcvtph2ps.128",
+ convertType(expr->getType()), ops);
+ }
+ case X86::BI__builtin_ia32_vcvtph2ps256_mask: {
+ mlir::Location loc = getLoc(expr->getExprLoc());
+ return emitIntrinsicCallOp(builder, loc, "x86.avx512.mask.vcvtph2ps.256",
+ convertType(expr->getType()), ops);
+ }
+ case X86::BI__builtin_ia32_vcvtph2ps512_mask: {
+ mlir::Location loc = getLoc(expr->getExprLoc());
+ return emitIntrinsicCallOp(builder, loc, "x86.avx512.mask.vcvtph2ps.512",
+ convertType(expr->getType()), ops);
+ }
+ case X86::BI__builtin_ia32_cvtneps2bf16_128_mask: {
+ mlir::Location loc = getLoc(expr->getExprLoc());
+ return emitIntrinsicCallOp(builder, loc, "x86.avx512bf16.mask.cvtneps2bf16.128",
+ convertType(expr->getType()), ops);
+ }
+ case X86::BI__builtin_ia32_cvtneps2bf16_256_mask: {
+ mlir::Location loc = getLoc(expr->getExprLoc());
+ return emitIntrinsicCallOp(builder, loc, "x86.avx512bf16.cvtneps2bf16.256",
+ convertType(expr->getType()), ops);
+ }
+ case X86::BI__builtin_ia32_cvtneps2bf16_512_mask: {
+ mlir::Location loc = getLoc(expr->getExprLoc());
+ return emitIntrinsicCallOp(builder, loc, "x86.avx512bf16.cvtneps2bf16.512",
+ convertType(expr->getType()), ops);
+ }
case X86::BI__cpuid:
case X86::BI__cpuidex:
case X86::BI__emul:
diff --git a/clang/test/CIR/CodeGen/X86/cir-convert-half.c b/clang/test/CIR/CodeGen/X86/cir-convert-half.c
new file mode 100644
index 0000000000000..4fad9aa02cfc1
--- /dev/null
+++ b/clang/test/CIR/CodeGen/X86/cir-convert-half.c
@@ -0,0 +1,55 @@
+// Test X86-specific convert_half builtins (4-argument form)
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-feature +avx512fp16 -target-feature +avx512bf16 -fclangir -emit-cir %s -o - | FileCheck --check-prefix=CIR %s
+
+typedef float __m512 __attribute__((__vector_size__(64), __aligned__(64)));
+typedef float __m256 __attribute__((__vector_size__(32), __aligned__(32)));
+typedef float __m128 __attribute__((__vector_size__(16), __aligned__(16)));
+typedef int __m256i __attribute__((__vector_size__(32), __aligned__(32)));
+typedef int __m128i __attribute__((__vector_size__(16), __aligned__(16)));
+typedef int __mmask16;
+typedef unsigned char __mmask8;
+typedef __bf16 __m256bh __attribute__((__vector_size__(32), __aligned__(32)));
+typedef __bf16 __m128bh __attribute__((__vector_size__(16), __aligned__(16)));
+
+// Test for __builtin_ia32_vcvtph2ps512_mask
+__m512 test_vcvtph2ps512_mask(__m256i a, __m512 src, __mmask16 k, __m512 passthru) {
+ return __builtin_ia32_vcvtph2ps512_mask(a, src, k, passthru);
+}
+// CIR-LABEL: cir.func {{.*}}@test_vcvtph2ps512_mask
+// CIR: cir.call @llvm.x86.avx512.mask.vcvtph2ps.512
+
+// Test for __builtin_ia32_vcvtph2ps256_mask
+__m256 test_vcvtph2ps256_mask(__m128i a, __m256 src, __mmask8 k, __m256 passthru) {
+ return __builtin_ia32_vcvtph2ps256_mask(a, src, k, passthru);
+}
+// CIR-LABEL: cir.func {{.*}}@test_vcvtph2ps256_mask
+// CIR: cir.call @llvm.x86.avx512.mask.vcvtph2ps.256
+
+// Test for __builtin_ia32_vcvtph2ps_mask
+__m128 test_vcvtph2ps_mask(__m128i a, __m128 src, __mmask8 k, __m128 passthru) {
+ return __builtin_ia32_vcvtph2ps_mask(a, src, k, passthru);
+}
+// CIR-LABEL: cir.func {{.*}}@test_vcvtph2ps_mask
+// CIR: cir.call @llvm.x86.avx512.mask.vcvtph2ps.128
+
+// Test for __builtin_ia32_cvtneps2bf16_512_mask
+__m256bh test_cvtneps2bf16_512_mask(__m512 a, __m256bh w, __mmask16 u, __m256bh passthru) {
+ return __builtin_ia32_cvtneps2bf16_512_mask(a, w, u, passthru);
+}
+// CIR-LABEL: cir.func {{.*}}@test_cvtneps2bf16_512_mask
+// CIR: cir.call @llvm.x86.avx512bf16.cvtneps2bf16.512
+
+// Test for __builtin_ia32_cvtneps2bf16_256_mask
+__m128bh test_cvtneps2bf16_256_mask(__m256 a, __m128bh w, __mmask8 u, __m128bh passthru) {
+ return __builtin_ia32_cvtneps2bf16_256_mask(a, w, u, passthru);
+}
+// CIR-LABEL: cir.func {{.*}}@test_cvtneps2bf16_256_mask
+// CIR: cir.call @llvm.x86.avx512bf16.cvtneps2bf16.256
+
+// Test for __builtin_ia32_cvtneps2bf16_128_mask
+__m128bh test_cvtneps2bf16_128_mask(__m128 a, __m128bh w, __mmask8 u, __m128bh passthru) {
+ return __builtin_ia32_cvtneps2bf16_128_mask(a, w, u, passthru);
+}
+// CIR-LABEL: cir.func {{.*}}@test_cvtneps2bf16_128_mask
+// CIR: cir.call @llvm.x86.avx512bf16.mask.cvtneps2bf16.128
\ No newline at end of file
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
🐧 Linux x64 Test Results
Failed Tests(click on a test name to see its output) ClangClang.CIR/CodeGenBuiltins/X86/avx512vlbf16-builtins.cClang.CIR/CodeGenBuiltins/X86/avx512vlbf16-builtins.cIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
badumbatish
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use clang/tools/clang-format/git-clang-format --binary ./build/bin/clang-format HEAD~1 to format your changes so it doesn't trigger the CI warning.
| case X86::BI__builtin_ia32_vcvtph2ps_mask: { | ||
| mlir::Location loc = getLoc(expr->getExprLoc()); | ||
| return emitIntrinsicCallOp(builder, loc, "x86.avx512.mask.vcvtph2ps.128", | ||
| convertType(expr->getType()), ops); | ||
| } | ||
| case X86::BI__builtin_ia32_vcvtph2ps256_mask: { | ||
| mlir::Location loc = getLoc(expr->getExprLoc()); | ||
| return emitIntrinsicCallOp(builder, loc, "x86.avx512.mask.vcvtph2ps.256", | ||
| convertType(expr->getType()), ops); | ||
| } | ||
| case X86::BI__builtin_ia32_vcvtph2ps512_mask: { | ||
| mlir::Location loc = getLoc(expr->getExprLoc()); | ||
| return emitIntrinsicCallOp(builder, loc, "x86.avx512.mask.vcvtph2ps.512", | ||
| convertType(expr->getType()), ops); | ||
| } | ||
| case X86::BI__builtin_ia32_cvtneps2bf16_128_mask: { | ||
| mlir::Location loc = getLoc(expr->getExprLoc()); | ||
| return emitIntrinsicCallOp(builder, loc, "x86.avx512bf16.mask.cvtneps2bf16.128", | ||
| convertType(expr->getType()), ops); | ||
| } | ||
| case X86::BI__builtin_ia32_cvtneps2bf16_256_mask: { | ||
| mlir::Location loc = getLoc(expr->getExprLoc()); | ||
| return emitIntrinsicCallOp(builder, loc, "x86.avx512bf16.cvtneps2bf16.256", | ||
| convertType(expr->getType()), ops); | ||
| } | ||
| case X86::BI__builtin_ia32_cvtneps2bf16_512_mask: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heyya, for this you should implement sth akin to below to save on space and readability
case X86::BI__builtin_ia32_reduce_fmax_pd512:
case X86::BI__builtin_ia32_reduce_fmax_ps512:
case X86::BI__builtin_ia32_reduce_fmax_ph512:
case X86::BI__builtin_ia32_reduce_fmax_ph256:
case X86::BI__builtin_ia32_reduce_fmax_ph128: {
StringRef intrinsicName = "";
switch (builtinID) {
case X86::BI__builtin_ia32_reduce_fmax_pd512:
intrinsicName = "vector.reduce.fmax.v8f64";
break;
case X86::BI__builtin_ia32_reduce_fmax_ps512:
intrinsicName = "vector.reduce.fmax.v16f32";
break;
case X86::BI__builtin_ia32_reduce_fmax_ph512:
intrinsicName = "vector.reduce.fmax.v32f16";
break;
case X86::BI__builtin_ia32_reduce_fmax_ph256:
intrinsicName = "vector.reduce.fmax.v16f16";
break;
case X86::BI__builtin_ia32_reduce_fmax_ph128:
intrinsicName = "vector.reduce.fmax.v8f16";
break;
}
return emitIntrinsicCallOp(...);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heyya, for this you should implement sth akin to below to save on space and readability
case X86::BI__builtin_ia32_reduce_fmax_pd512: case X86::BI__builtin_ia32_reduce_fmax_ps512: case X86::BI__builtin_ia32_reduce_fmax_ph512: case X86::BI__builtin_ia32_reduce_fmax_ph256: case X86::BI__builtin_ia32_reduce_fmax_ph128: { StringRef intrinsicName = ""; switch (builtinID) { case X86::BI__builtin_ia32_reduce_fmax_pd512: intrinsicName = "vector.reduce.fmax.v8f64"; break; case X86::BI__builtin_ia32_reduce_fmax_ps512: intrinsicName = "vector.reduce.fmax.v16f32"; break; case X86::BI__builtin_ia32_reduce_fmax_ph512: intrinsicName = "vector.reduce.fmax.v32f16"; break; case X86::BI__builtin_ia32_reduce_fmax_ph256: intrinsicName = "vector.reduce.fmax.v16f16"; break; case X86::BI__builtin_ia32_reduce_fmax_ph128: intrinsicName = "vector.reduce.fmax.v8f16"; break; } return emitIntrinsicCallOp(...);
I too considered this way and I take it that it's your implementation of the reduce intrinsics right? But I think we are supposed to follow the already established pattern aren't we? I would be more than happy to implement this way though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too considered this way and I take it that it's your implementation of the reduce intrinsics right? But I think we are supposed to follow the already established pattern aren't we? I would be more than happy to implement this way though.
In general, we are trying to follow the established code patterns, but if the code can be improved it's OK to diverge. I like the suggestion from @badumbatish here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do it that way then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too considered this way and I take it that it's your implementation of the reduce intrinsics right? But I think we are supposed to follow the already established pattern aren't we? I would be more than happy to implement this way though.
In general, we are trying to follow the established code patterns, but if the code can be improved it's OK to diverge. I like the suggestion from @badumbatish here.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably go into a new file inside clang/test/CIR/CodeGenBuiltins/X86/ to match OG's clang/test/CodeGen/X86/avx512vlbf16-builtins.c.
You can check out that file to import all the test over as well.
Don't forget to set up 3 prefix: CIR, LLVM and OGCG as well, see clang/test/CIR/CodeGenBuiltins/X86/avx512fp16-builtins.c for details
done |
0411ce1 to
8ca7bbb
Compare
|
✅ With the latest revision this PR passed the Python code formatter. |
|
@andykaylor, the CI checks are failing and the build log is saying this-
Neither did I edit this file nor does it contain |
|
02efe9e to
dd5108b
Compare
Yes, they worked this time. |
| case X86::BI__builtin_ia32_vcvtph2ps_mask: { | ||
| mlir::Location loc = getLoc(expr->getExprLoc()); | ||
| return emitIntrinsicCallOp(builder, loc, "x86.avx512.mask.vcvtph2ps.128", | ||
| convertType(expr->getType()), ops); | ||
| } | ||
| case X86::BI__builtin_ia32_vcvtph2ps256_mask: { | ||
| mlir::Location loc = getLoc(expr->getExprLoc()); | ||
| return emitIntrinsicCallOp(builder, loc, "x86.avx512.mask.vcvtph2ps.256", | ||
| convertType(expr->getType()), ops); | ||
| } | ||
| case X86::BI__builtin_ia32_vcvtph2ps512_mask: { | ||
| mlir::Location loc = getLoc(expr->getExprLoc()); | ||
| return emitIntrinsicCallOp(builder, loc, "x86.avx512.mask.vcvtph2ps.512", | ||
| convertType(expr->getType()), ops); | ||
| } | ||
| case X86::BI__builtin_ia32_cvtneps2bf16_128_mask: { | ||
| mlir::Location loc = getLoc(expr->getExprLoc()); | ||
| return emitIntrinsicCallOp(builder, loc, "x86.avx512bf16.mask.cvtneps2bf16.128", | ||
| convertType(expr->getType()), ops); | ||
| } | ||
| case X86::BI__builtin_ia32_cvtneps2bf16_256_mask: { | ||
| mlir::Location loc = getLoc(expr->getExprLoc()); | ||
| return emitIntrinsicCallOp(builder, loc, "x86.avx512bf16.cvtneps2bf16.256", | ||
| convertType(expr->getType()), ops); | ||
| } | ||
| case X86::BI__builtin_ia32_cvtneps2bf16_512_mask: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too considered this way and I take it that it's your implementation of the reduce intrinsics right? But I think we are supposed to follow the already established pattern aren't we? I would be more than happy to implement this way though.
In general, we are trying to follow the established code patterns, but if the code can be improved it's OK to diverge. I like the suggestion from @badumbatish here.
| // OGCG-LABEL: test_cvtneps2bf16_128_mask | ||
| __bf16 test_cvtneps2bf16_128_mask(__m128 a, __bf16 w, __mmask8 u) { | ||
| return __builtin_ia32_cvtneps2bf16_128_mask(a, w, u); | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing a newline at the end of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
| case X86::BI__builtin_ia32_cvtneps2bf16_256_mask: | ||
| intrinsicName = "x86.avx512bf16.cvtneps2bf16.256"; | ||
| break; | ||
| case X86::BI__builtin_ia32_cvtneps2bf16_512_mask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to follow the intrinsic with a select operation for BI__builtin_ia32_cvtneps2bf16_256_mask and BI__builtin_ia32_cvtneps2bf16_512_mask
| case X86::BI__builtin_ia32_vcvtph2ps256_mask: | ||
| intrinsicName = "x86.avx512.mask.vcvtph2ps.256"; | ||
| break; | ||
| case X86::BI__builtin_ia32_vcvtph2ps512_mask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BI__builtin_ia32_vcvtph2ps_mask, BI__builtin_ia32_vcvtph2ps256_mask, and BI__builtin_ia32_vcvtph2ps512_mask aren't this simple. In classic codegen they are implemented with a call to EmitX86CvtF16ToFloatExpr, which has quite a bit more code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, we'll need to create a helper like EmitX86CvtF16ToFloatExpr in CIR first, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, except in CIR it should be emitX86CvtF16ToFloatExpr due to our slightly different coding style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it as emitCIRX86CvtF16ToFloatExpr. Should I rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i renamed it to emitX86CvtF16ToFloatExpr, @andykaylor
| // LLVM-LABEL: test_vcvtph2ps512_mask | ||
| // OGCG-LABEL: test_vcvtph2ps512_mask | ||
| __m512 test_vcvtph2ps512_mask(__m256i a, __m512 src, __mmask16 k) { | ||
| return __builtin_ia32_vcvtph2ps512_mask(a, src, k, 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than creating a new test and calling these builtins directly, you should create clang/test/CIR/CodeGenBuiltins/X86/avx512vlbf16-builtins.c and test these by calling the wrapper functions defined in clang/lib/Headers/avx512vlbf16intrin.h. You may use clang/test/CodeGen/X86/avx512vlbf16-builtins.c as a starting point, but you'll need to change the RUN lines and the checks, and you won't want to copy unrelated tests.
| // LLVM-LABEL: test_vcvtph2ps256_mask | ||
| // OGCG-LABEL: test_vcvtph2ps256_mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checks are not doing anything useful. You should have check lines that show the details of the intrinsic call that was generated.
75080cc to
edd7838
Compare
b914cf8 to
53bd59c
Compare
related to #167765
Calls the LLVM intrinsic for the following builtins-
BI__builtin_ia32_vcvtph2ps_maskBI__builtin_ia32_vcvtph2ps256_maskBI__builtin_ia32_vcvtph2ps512_maskBI__builtin_ia32_cvtneps2bf16_128_maskBI__builtin_ia32_cvtneps2bf16_256_maskBI__builtin_ia32_cvtneps2bf16_512_mask