-
Notifications
You must be signed in to change notification settings - Fork 1.7k
test failures with POWER10 kernel and GCC 16 #5728
Copy link
Copy link
Open
Description
I am seeing test failures with POWER10 kernel built with GCC 16 (gcc-16.0.1-0.10.fc45.ppc64le) and run on Power10 hardware. The tests pass when built with GCC 15 on the same hardware. Based on the previous experiences I would guess GCC 16 became stricter (or more advanced) again and the inline assembly code in the Power10 kernel isn't fully valid any more.
...
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize -o dblat3 dblat3.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib -latomic_asneeded -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize -o cblat2 cblat2.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib -latomic_asneeded -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize -o zblat2 zblat2.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib -latomic_asneeded -lc
rm -f ?BLAT2.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_bgemv > BBLAT2.SUMM
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize -o cblat3 cblat3.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib -latomic_asneeded -lc
gfortran -O2 -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -O2 -frecursive -mcpu=power10 -mtune=power10 -fno-fast-math -DUSE_OPENMP -fopenmp -fno-optimize-sibling-calls -fno-tree-vectorize -o zblat3 zblat3.o ../libopenblas_power10p-r0.3.32.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/ppc64le-redhat-linux/16 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/ppc64le-redhat-linux/16/../../.. -L/lib -L/usr/lib -latomic_asneeded -lc
rm -f ?BLAT3.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_sbgemm > SBBLAT3.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_bgemm > BBLAT3.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./test_sbgemv > SBBLAT2.SUMM
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat3 < ./sblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat3 < ./dblat3.dat
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat3 < ./cblat3.dat
TESTS OF THE COMPLEX LEVEL 3 BLAS
THE FOLLOWING PARAMETER VALUES WILL BE USED:
FOR N 0 1 2 3 7 31
FOR ALPHA ( 0.0, 0.0) ( 1.0, 0.0) ( 0.7,-0.9)
FOR BETA ( 0.0, 0.0) ( 1.0, 0.0) ( 1.3,-1.1)
ERROR-EXITS WILL NOT BE TESTED
ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00
RELATIVE MACHINE PRECISION IS TAKEN TO BE 1.2E-07
CGEMM PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)
CHEMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( 1.57757 , -0.324314 ) ( 1.57757 , -0.324314 )
2 ( -0.149664 , 0.581641 ) ( -0.149664 , 0.581641 )
3 ( -0.748555 , -1.09547 ) ( -0.748555 , -1.09547 )
4 ( -0.604366 , -0.895836 ) ( -0.604366 , -0.895836 )
5 ( -0.650925 , 0.394394 ) ( -0.650925 , 0.394394 )
6 ( -0.465727 , 0.842006 ) ( -0.465727 , 0.842006 )
7 ( 0.420629 , 0.597693 ) ( -0.587136E-01, -0.543813 )
8 ( 0.786457 , 0.544220E-01) ( -0.138154E-01, 0.184827 )
9 ( 0.167691 , 0.207608 ) ( 0.167691 , 0.207608 )
10 ( -0.321436 , -0.667076 ) ( -0.321436 , -0.667076 )
11 ( -0.303583 , -0.249012E-01) ( -0.303584 , -0.249011E-01)
12 ( -1.20584 , 0.376045 ) ( -1.20584 , 0.376044 )
13 ( 0.280570 , 0.680643 ) ( 0.280570 , 0.680643 )
14 ( 1.11913 , 0.831795 ) ( 1.11913 , 0.831795 )
15 ( -0.445470 , -1.08482 ) ( -0.743962 , 0.729312 )
16 ( -0.425975 , -0.378074 ) ( -0.964980E-01, -0.319019 )
17 ( -0.740210 , -1.03159 ) ( -0.740210 , -1.03159 )
18 ( 1.00878 , 0.580040 ) ( 1.00878 , 0.580040 )
19 ( 0.123999 , -0.418330 ) ( 0.123999 , -0.418330 )
20 ( -0.207821 , -0.467468 ) ( -0.207821 , -0.467468 )
21 ( -0.471160 , -1.47356 ) ( -0.471160 , -1.47356 )
22 ( -0.329621 , 0.782363 ) ( -0.329621 , 0.782364 )
23 ( -0.248915 , 0.671276 ) ( 0.515318 , -0.225023 )
24 ( -0.154857 , -0.108282 ) ( -0.479790E-01, 0.823263E-01)
25 ( 0.327719 , -0.149753 ) ( 0.327719 , -0.149753 )
26 ( 0.104212 , 0.378216 ) ( 0.104212 , 0.378216 )
27 ( 0.111354 , -0.524580E-01) ( 0.111354 , -0.524580E-01)
28 ( 0.301476 , 0.218972E-01) ( 0.301476 , 0.218972E-01)
29 ( -0.185482 , 0.210484 ) ( -0.185482 , 0.210484 )
30 ( 0.535875 , 0.368959 ) ( 0.535875 , 0.368959 )
31 ( 0.969031E-01, 0.298701 ) ( 0.969031E-01, 0.298701 )
THESE ARE THE RESULTS FOR COLUMN 1
******* CTRMM FAILED ON CALL NUMBER:
2450: CTRMM ('L','U','N','U', 31, 7,( 1.0, 0.0), A, 32, B, 32) .
CTRSM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
CHERK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYRK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CHER2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYR2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
END OF TESTS
OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat3 < ./zblat3.dat
TESTS OF THE COMPLEX*16 LEVEL 3 BLAS
THE FOLLOWING PARAMETER VALUES WILL BE USED:
FOR N 0 1 2 3 7 31
FOR ALPHA ( 0.0, 0.0) ( 1.0, 0.0) ( 0.7,-0.9)
FOR BETA ( 0.0, 0.0) ( 1.0, 0.0) ( 1.3,-1.1)
ERROR-EXITS WILL NOT BE TESTED
ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00
RELATIVE MACHINE PRECISION IS TAKEN TO BE 2.2D-16
ZGEMM PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)
ZHEMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( -0.803402E-01, 0.421751 ) ( -0.803402E-01, 0.421751 )
2 ( 0.691964 , 0.209721 ) ( 0.691964 , 0.209721 )
3 ( 0.553420 , -0.312582 ) ( 0.440480 , -0.729041E-02)
4 ( 0.283286 , -0.145302 ) ( 0.153001 , 0.189155 )
5 ( -0.816776E-01, -0.546559 ) ( -0.816776E-01, -0.546559 )
6 ( -0.270234 , 0.120707 ) ( -0.270234 , 0.120707 )
7 ( 0.106893 , 0.242757 ) ( 0.106893 , 0.242757 )
******* ZTRMM FAILED ON CALL NUMBER:
1802: ZTRMM ('L','U','N','U', 7, 1,( 1.0, 0.0), A, 8, B, 8) .
ZTRSM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
ZHERK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYRK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZHER2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYR2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
END OF TESTS
rm -f ?BLAT3.SUMM
OMP_NUM_THREADS=2 ./test_sbgemm > SBBLAT3.SUMM
SBGEMV FAILURES: 705118
make[1]: *** [Makefile:149: level2] Error 1
make[1]: *** Waiting for unfinished jobs....
OMP_NUM_THREADS=2 ./test_bgemm > BBLAT3.SUMM
OMP_NUM_THREADS=2 ./sblat3 < ./sblat3.dat
OMP_NUM_THREADS=2 ./dblat3 < ./dblat3.dat
OMP_NUM_THREADS=2 ./cblat3 < ./cblat3.dat
TESTS OF THE COMPLEX LEVEL 3 BLAS
THE FOLLOWING PARAMETER VALUES WILL BE USED:
FOR N 0 1 2 3 7 31
FOR ALPHA ( 0.0, 0.0) ( 1.0, 0.0) ( 0.7,-0.9)
FOR BETA ( 0.0, 0.0) ( 1.0, 0.0) ( 1.3,-1.1)
ERROR-EXITS WILL NOT BE TESTED
ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00
RELATIVE MACHINE PRECISION IS TAKEN TO BE 1.2E-07
CGEMM PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)
CHEMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( 1.57757 , -0.324314 ) ( 1.57757 , -0.324314 )
2 ( -0.149664 , 0.581641 ) ( -0.149664 , 0.581641 )
3 ( -0.748555 , -1.09547 ) ( -0.748555 , -1.09547 )
4 ( -0.604366 , -0.895836 ) ( -0.604366 , -0.895836 )
5 ( -0.650925 , 0.394394 ) ( -0.650925 , 0.394394 )
6 ( -0.465727 , 0.842006 ) ( -0.465727 , 0.842006 )
7 ( 0.420629 , 0.597693 ) ( -0.587136E-01, -0.543813 )
8 ( 0.786457 , 0.544220E-01) ( -0.138154E-01, 0.184827 )
9 ( 0.167691 , 0.207608 ) ( 0.167691 , 0.207608 )
10 ( -0.321436 , -0.667076 ) ( -0.321436 , -0.667076 )
11 ( -0.303583 , -0.249012E-01) ( -0.303584 , -0.249011E-01)
12 ( -1.20584 , 0.376045 ) ( -1.20584 , 0.376044 )
13 ( 0.280570 , 0.680643 ) ( 0.280570 , 0.680643 )
14 ( 1.11913 , 0.831795 ) ( 1.11913 , 0.831795 )
15 ( -0.445470 , -1.08482 ) ( -0.743962 , 0.729312 )
16 ( -0.425975 , -0.378074 ) ( -0.964980E-01, -0.319019 )
17 ( -0.740210 , -1.03159 ) ( -0.740210 , -1.03159 )
18 ( 1.00878 , 0.580040 ) ( 1.00878 , 0.580040 )
19 ( 0.123999 , -0.418330 ) ( 0.123999 , -0.418330 )
20 ( -0.207821 , -0.467468 ) ( -0.207821 , -0.467468 )
21 ( -0.471160 , -1.47356 ) ( -0.471160 , -1.47356 )
22 ( -0.329621 , 0.782363 ) ( -0.329621 , 0.782364 )
23 ( -0.248915 , 0.671276 ) ( 0.515318 , -0.225023 )
24 ( -0.154857 , -0.108282 ) ( -0.479790E-01, 0.823263E-01)
25 ( 0.327719 , -0.149753 ) ( 0.327719 , -0.149753 )
26 ( 0.104212 , 0.378216 ) ( 0.104212 , 0.378216 )
27 ( 0.111354 , -0.524580E-01) ( 0.111354 , -0.524580E-01)
28 ( 0.301476 , 0.218972E-01) ( 0.301476 , 0.218972E-01)
29 ( -0.185482 , 0.210484 ) ( -0.185482 , 0.210484 )
30 ( 0.535875 , 0.368959 ) ( 0.535875 , 0.368959 )
31 ( 0.969031E-01, 0.298701 ) ( 0.969031E-01, 0.298701 )
THESE ARE THE RESULTS FOR COLUMN 1
******* CTRMM FAILED ON CALL NUMBER:
2450: CTRMM ('L','U','N','U', 31, 7,( 1.0, 0.0), A, 32, B, 32) .
CTRSM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
CHERK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYRK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CHER2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
CSYR2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
END OF TESTS
OMP_NUM_THREADS=2 ./zblat3 < ./zblat3.dat
TESTS OF THE COMPLEX*16 LEVEL 3 BLAS
THE FOLLOWING PARAMETER VALUES WILL BE USED:
FOR N 0 1 2 3 7 31
FOR ALPHA ( 0.0, 0.0) ( 1.0, 0.0) ( 0.7,-0.9)
FOR BETA ( 0.0, 0.0) ( 1.0, 0.0) ( 1.3,-1.1)
ERROR-EXITS WILL NOT BE TESTED
ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN 16.00
RELATIVE MACHINE PRECISION IS TAKEN TO BE 2.2D-16
ZGEMM PASSED THE COMPUTATIONAL TESTS ( 17496 CALLS)
ZHEMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYMM PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
EXPECTED RESULT COMPUTED RESULT
1 ( -0.803402E-01, 0.421751 ) ( -0.803402E-01, 0.421751 )
2 ( 0.691964 , 0.209721 ) ( 0.691964 , 0.209721 )
3 ( 0.553420 , -0.312582 ) ( 0.440480 , -0.729041E-02)
4 ( 0.283286 , -0.145302 ) ( 0.153001 , 0.189155 )
5 ( -0.816776E-01, -0.546559 ) ( -0.816776E-01, -0.546559 )
6 ( -0.270234 , 0.120707 ) ( -0.270234 , 0.120707 )
7 ( 0.106893 , 0.242757 ) ( 0.106893 , 0.242757 )
******* ZTRMM FAILED ON CALL NUMBER:
1802: ZTRMM ('L','U','N','U', 7, 1,( 1.0, 0.0), A, 8, B, 8) .
ZTRSM PASSED THE COMPUTATIONAL TESTS ( 2592 CALLS)
ZHERK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYRK PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZHER2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
ZSYR2K PASSED THE COMPUTATIONAL TESTS ( 1296 CALLS)
END OF TESTS
make[1]: Leaving directory '/root/projects/OpenBLAS/test'
make: *** [Makefile:176: tests] Error 2
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels