Conversation
00b155f to
3819863
Compare
CBMC Results (ML-DSA-87)Full Results (175 proofs)
|
CBMC Results (ML-DSA-44)Full Results (175 proofs)
|
CBMC Results (ML-DSA-65)Full Results (175 proofs)
|
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
45681 cycles |
45685 cycles |
1.00 |
ML-DSA-44 sign |
131153 cycles |
131164 cycles |
1.00 |
ML-DSA-44 verify |
47527 cycles |
47530 cycles |
1.00 |
ML-DSA-65 keypair |
80457 cycles |
80479 cycles |
1.00 |
ML-DSA-65 sign |
215715 cycles |
215740 cycles |
1.00 |
ML-DSA-65 verify |
79737 cycles |
79735 cycles |
1.00 |
ML-DSA-87 keypair |
131177 cycles |
131175 cycles |
1.00 |
ML-DSA-87 sign |
277048 cycles |
277004 cycles |
1.00 |
ML-DSA-87 verify |
130004 cycles |
129971 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
111983 cycles |
111979 cycles |
1.00 |
ML-DSA-44 sign |
403592 cycles |
403622 cycles |
1.00 |
ML-DSA-44 verify |
119886 cycles |
119876 cycles |
1.00 |
ML-DSA-65 keypair |
192137 cycles |
192166 cycles |
1.00 |
ML-DSA-65 sign |
657120 cycles |
657078 cycles |
1.00 |
ML-DSA-65 verify |
193900 cycles |
193891 cycles |
1.00 |
ML-DSA-87 keypair |
317930 cycles |
318010 cycles |
1.00 |
ML-DSA-87 sign |
836905 cycles |
836903 cycles |
1.00 |
ML-DSA-87 verify |
322922 cycles |
322994 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
34340 cycles |
34361 cycles |
1.00 |
ML-DSA-44 sign |
119648 cycles |
120023 cycles |
1.00 |
ML-DSA-44 verify |
37990 cycles |
38140 cycles |
1.00 |
ML-DSA-65 keypair |
60562 cycles |
60626 cycles |
1.00 |
ML-DSA-65 sign |
201239 cycles |
200228 cycles |
1.01 |
ML-DSA-65 verify |
62873 cycles |
62578 cycles |
1.00 |
ML-DSA-87 keypair |
93377 cycles |
93913 cycles |
0.99 |
ML-DSA-87 sign |
232229 cycles |
235482 cycles |
0.99 |
ML-DSA-87 verify |
94479 cycles |
94514 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
229063 cycles |
232745 cycles |
0.98 |
ML-DSA-44 sign |
628858 cycles |
629812 cycles |
1.00 |
ML-DSA-44 verify |
229339 cycles |
229277 cycles |
1.00 |
ML-DSA-65 keypair |
378941 cycles |
422090 cycles |
0.90 |
ML-DSA-65 sign |
1007370 cycles |
1067756 cycles |
0.94 |
ML-DSA-65 verify |
376246 cycles |
393848 cycles |
0.96 |
ML-DSA-87 keypair |
690237 cycles |
673725 cycles |
1.02 |
ML-DSA-87 sign |
1396068 cycles |
1405386 cycles |
0.99 |
ML-DSA-87 verify |
663094 cycles |
657567 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
93562 cycles |
93808 cycles |
1.00 |
ML-DSA-44 sign |
332581 cycles |
332528 cycles |
1.00 |
ML-DSA-44 verify |
99714 cycles |
99696 cycles |
1.00 |
ML-DSA-65 keypair |
159833 cycles |
160037 cycles |
1.00 |
ML-DSA-65 sign |
543737 cycles |
544483 cycles |
1.00 |
ML-DSA-65 verify |
160524 cycles |
160826 cycles |
1.00 |
ML-DSA-87 keypair |
267186 cycles |
266702 cycles |
1.00 |
ML-DSA-87 sign |
707232 cycles |
705628 cycles |
1.00 |
ML-DSA-87 verify |
270355 cycles |
270568 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68896 cycles |
69270 cycles |
0.99 |
ML-DSA-44 sign |
187431 cycles |
187049 cycles |
1.00 |
ML-DSA-44 verify |
68887 cycles |
69047 cycles |
1.00 |
ML-DSA-65 keypair |
119600 cycles |
119031 cycles |
1.00 |
ML-DSA-65 sign |
299540 cycles |
299818 cycles |
1.00 |
ML-DSA-65 verify |
115518 cycles |
115291 cycles |
1.00 |
ML-DSA-87 keypair |
203742 cycles |
203891 cycles |
1.00 |
ML-DSA-87 sign |
393131 cycles |
394659 cycles |
1.00 |
ML-DSA-87 verify |
195707 cycles |
195766 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
57327 cycles |
56563 cycles |
1.01 |
ML-DSA-44 sign |
180726 cycles |
181874 cycles |
0.99 |
ML-DSA-44 verify |
60901 cycles |
61156 cycles |
1.00 |
ML-DSA-65 keypair |
98660 cycles |
98757 cycles |
1.00 |
ML-DSA-65 sign |
298138 cycles |
298537 cycles |
1.00 |
ML-DSA-65 verify |
100095 cycles |
100518 cycles |
1.00 |
ML-DSA-87 keypair |
152331 cycles |
152679 cycles |
1.00 |
ML-DSA-87 sign |
355616 cycles |
355558 cycles |
1.00 |
ML-DSA-87 verify |
154183 cycles |
152966 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
41477 cycles |
41136 cycles |
1.01 |
ML-DSA-44 sign |
133286 cycles |
132617 cycles |
1.01 |
ML-DSA-44 verify |
44031 cycles |
44492 cycles |
0.99 |
ML-DSA-65 keypair |
72317 cycles |
72104 cycles |
1.00 |
ML-DSA-65 sign |
213181 cycles |
214651 cycles |
0.99 |
ML-DSA-65 verify |
71974 cycles |
72444 cycles |
0.99 |
ML-DSA-87 keypair |
107833 cycles |
107657 cycles |
1.00 |
ML-DSA-87 sign |
250476 cycles |
250266 cycles |
1.00 |
ML-DSA-87 verify |
109230 cycles |
112595 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 3819863 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-65 keypair |
75829 cycles |
72591 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
134758 cycles |
134710 cycles |
1.00 |
ML-DSA-44 sign |
523723 cycles |
526054 cycles |
1.00 |
ML-DSA-44 verify |
147705 cycles |
147500 cycles |
1.00 |
ML-DSA-65 keypair |
226449 cycles |
226690 cycles |
1.00 |
ML-DSA-65 sign |
860712 cycles |
861192 cycles |
1.00 |
ML-DSA-65 verify |
235070 cycles |
235381 cycles |
1.00 |
ML-DSA-87 keypair |
370974 cycles |
370668 cycles |
1.00 |
ML-DSA-87 sign |
1079141 cycles |
1078305 cycles |
1.00 |
ML-DSA-87 verify |
383049 cycles |
383429 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
157394 cycles |
157188 cycles |
1.00 |
ML-DSA-44 sign |
549561 cycles |
548996 cycles |
1.00 |
ML-DSA-44 verify |
169498 cycles |
169283 cycles |
1.00 |
ML-DSA-65 keypair |
267800 cycles |
269077 cycles |
1.00 |
ML-DSA-65 sign |
903011 cycles |
906033 cycles |
1.00 |
ML-DSA-65 verify |
273909 cycles |
275229 cycles |
1.00 |
ML-DSA-87 keypair |
449680 cycles |
448040 cycles |
1.00 |
ML-DSA-87 sign |
1161535 cycles |
1157923 cycles |
1.00 |
ML-DSA-87 verify |
460234 cycles |
457343 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
67942 cycles |
68235 cycles |
1.00 |
ML-DSA-44 sign |
201998 cycles |
201899 cycles |
1.00 |
ML-DSA-44 verify |
70776 cycles |
70799 cycles |
1.00 |
ML-DSA-65 keypair |
121036 cycles |
121045 cycles |
1.00 |
ML-DSA-65 sign |
331322 cycles |
331301 cycles |
1.00 |
ML-DSA-65 verify |
117850 cycles |
117988 cycles |
1.00 |
ML-DSA-87 keypair |
198669 cycles |
197907 cycles |
1.00 |
ML-DSA-87 sign |
428529 cycles |
426619 cycles |
1.00 |
ML-DSA-87 verify |
194582 cycles |
194362 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
72254 cycles |
72298 cycles |
1.00 |
ML-DSA-44 sign |
211942 cycles |
211862 cycles |
1.00 |
ML-DSA-44 verify |
75645 cycles |
75651 cycles |
1.00 |
ML-DSA-65 keypair |
127516 cycles |
127564 cycles |
1.00 |
ML-DSA-65 sign |
350254 cycles |
350256 cycles |
1.00 |
ML-DSA-65 verify |
125449 cycles |
125447 cycles |
1.00 |
ML-DSA-87 keypair |
208196 cycles |
208014 cycles |
1.00 |
ML-DSA-87 sign |
448893 cycles |
448910 cycles |
1.00 |
ML-DSA-87 verify |
205308 cycles |
205681 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120311 cycles |
120460 cycles |
1.00 |
ML-DSA-44 sign |
447300 cycles |
447234 cycles |
1.00 |
ML-DSA-44 verify |
129710 cycles |
130455 cycles |
0.99 |
ML-DSA-65 keypair |
204437 cycles |
203981 cycles |
1.00 |
ML-DSA-65 sign |
728421 cycles |
730686 cycles |
1.00 |
ML-DSA-65 verify |
209421 cycles |
210398 cycles |
1.00 |
ML-DSA-87 keypair |
337688 cycles |
337748 cycles |
1.00 |
ML-DSA-87 sign |
926903 cycles |
922242 cycles |
1.01 |
ML-DSA-87 verify |
346060 cycles |
347109 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128224 cycles |
128233 cycles |
1.00 |
ML-DSA-44 sign |
447684 cycles |
447406 cycles |
1.00 |
ML-DSA-44 verify |
138331 cycles |
142161 cycles |
0.97 |
ML-DSA-65 keypair |
220728 cycles |
220585 cycles |
1.00 |
ML-DSA-65 sign |
727613 cycles |
726570 cycles |
1.00 |
ML-DSA-65 verify |
223172 cycles |
223096 cycles |
1.00 |
ML-DSA-87 keypair |
365009 cycles |
365027 cycles |
1.00 |
ML-DSA-87 sign |
926270 cycles |
926682 cycles |
1.00 |
ML-DSA-87 verify |
372774 cycles |
372462 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138503 cycles |
138431 cycles |
1.00 |
ML-DSA-44 sign |
484053 cycles |
483804 cycles |
1.00 |
ML-DSA-44 verify |
148725 cycles |
156357 cycles |
0.95 |
ML-DSA-65 keypair |
241276 cycles |
241178 cycles |
1.00 |
ML-DSA-65 sign |
792427 cycles |
792015 cycles |
1.00 |
ML-DSA-65 verify |
241215 cycles |
241086 cycles |
1.00 |
ML-DSA-87 keypair |
396496 cycles |
396336 cycles |
1.00 |
ML-DSA-87 sign |
1013013 cycles |
1012796 cycles |
1.00 |
ML-DSA-87 verify |
402599 cycles |
402305 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113820 cycles |
113345 cycles |
1.00 |
ML-DSA-44 sign |
357341 cycles |
356055 cycles |
1.00 |
ML-DSA-44 verify |
118529 cycles |
118038 cycles |
1.00 |
ML-DSA-65 keypair |
196679 cycles |
196907 cycles |
1.00 |
ML-DSA-65 sign |
588785 cycles |
590403 cycles |
1.00 |
ML-DSA-65 verify |
194716 cycles |
195057 cycles |
1.00 |
ML-DSA-87 keypair |
323237 cycles |
322985 cycles |
1.00 |
ML-DSA-87 sign |
754039 cycles |
753517 cycles |
1.00 |
ML-DSA-87 verify |
320375 cycles |
320636 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
827476 cycles |
828088 cycles |
1.00 |
ML-DSA-44 sign |
3238353 cycles |
3233170 cycles |
1.00 |
ML-DSA-44 verify |
921919 cycles |
920794 cycles |
1.00 |
ML-DSA-65 keypair |
1413613 cycles |
1413452 cycles |
1.00 |
ML-DSA-65 sign |
5340696 cycles |
5347688 cycles |
1.00 |
ML-DSA-65 verify |
1477470 cycles |
1477937 cycles |
1.00 |
ML-DSA-87 keypair |
2311391 cycles |
2312894 cycles |
1.00 |
ML-DSA-87 sign |
6659117 cycles |
6665352 cycles |
1.00 |
ML-DSA-87 verify |
2409640 cycles |
2411069 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
214003 cycles |
213077 cycles |
1.00 |
ML-DSA-44 sign |
765036 cycles |
760523 cycles |
1.01 |
ML-DSA-44 verify |
230465 cycles |
233125 cycles |
0.99 |
ML-DSA-65 keypair |
380442 cycles |
380915 cycles |
1.00 |
ML-DSA-65 sign |
1253729 cycles |
1251999 cycles |
1.00 |
ML-DSA-65 verify |
371997 cycles |
372378 cycles |
1.00 |
ML-DSA-87 keypair |
604923 cycles |
605968 cycles |
1.00 |
ML-DSA-87 sign |
1594853 cycles |
1593941 cycles |
1.00 |
ML-DSA-87 verify |
619102 cycles |
617894 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
311698 cycles |
306606 cycles |
1.02 |
ML-DSA-44 sign |
1174058 cycles |
1166146 cycles |
1.01 |
ML-DSA-44 verify |
333560 cycles |
335430 cycles |
0.99 |
ML-DSA-65 keypair |
550737 cycles |
562274 cycles |
0.98 |
ML-DSA-65 sign |
1894590 cycles |
1916493 cycles |
0.99 |
ML-DSA-65 verify |
529438 cycles |
533535 cycles |
0.99 |
ML-DSA-87 keypair |
872695 cycles |
865006 cycles |
1.01 |
ML-DSA-87 sign |
2468410 cycles |
2417913 cycles |
1.02 |
ML-DSA-87 verify |
900121 cycles |
884966 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
309195 cycles |
299195 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
277182 cycles |
278160 cycles |
1.00 |
ML-DSA-44 sign |
816109 cycles |
822535 cycles |
0.99 |
ML-DSA-44 verify |
280990 cycles |
278070 cycles |
1.01 |
ML-DSA-65 keypair |
477648 cycles |
476503 cycles |
1.00 |
ML-DSA-65 sign |
1398700 cycles |
1347085 cycles |
1.04 |
ML-DSA-65 verify |
461181 cycles |
456015 cycles |
1.01 |
ML-DSA-87 keypair |
825204 cycles |
796551 cycles |
1.04 |
ML-DSA-87 sign |
1886968 cycles |
1773335 cycles |
1.06 |
ML-DSA-87 verify |
803609 cycles |
772360 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
72bc3f8 to
d186f5e
Compare
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113150 cycles |
113204 cycles |
1.00 |
ML-DSA-44 sign |
355525 cycles |
355548 cycles |
1.00 |
ML-DSA-44 verify |
117877 cycles |
117886 cycles |
1.00 |
ML-DSA-65 keypair |
196192 cycles |
196406 cycles |
1.00 |
ML-DSA-65 sign |
588774 cycles |
588666 cycles |
1.00 |
ML-DSA-65 verify |
194576 cycles |
194481 cycles |
1.00 |
ML-DSA-87 keypair |
322391 cycles |
321917 cycles |
1.00 |
ML-DSA-87 sign |
751848 cycles |
752728 cycles |
1.00 |
ML-DSA-87 verify |
319927 cycles |
320132 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212745 cycles |
212659 cycles |
1.00 |
ML-DSA-44 sign |
759537 cycles |
759393 cycles |
1.00 |
ML-DSA-44 verify |
229014 cycles |
228980 cycles |
1.00 |
ML-DSA-65 keypair |
380339 cycles |
380359 cycles |
1.00 |
ML-DSA-65 sign |
1251422 cycles |
1251433 cycles |
1.00 |
ML-DSA-65 verify |
372106 cycles |
372151 cycles |
1.00 |
ML-DSA-87 keypair |
605932 cycles |
605385 cycles |
1.00 |
ML-DSA-87 sign |
1591645 cycles |
1591182 cycles |
1.00 |
ML-DSA-87 verify |
617975 cycles |
617388 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton2 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 verify |
241958 cycles |
229196 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
464439 cycles |
465373 cycles |
1.00 |
ML-DSA-44 sign |
2143725 cycles |
2152438 cycles |
1.00 |
ML-DSA-44 verify |
551422 cycles |
551474 cycles |
1.00 |
ML-DSA-65 keypair |
783189 cycles |
781420 cycles |
1.00 |
ML-DSA-65 sign |
3519184 cycles |
3519262 cycles |
1.00 |
ML-DSA-65 verify |
855778 cycles |
854831 cycles |
1.00 |
ML-DSA-87 keypair |
1261568 cycles |
1263149 cycles |
1.00 |
ML-DSA-87 sign |
4343372 cycles |
4339952 cycles |
1.00 |
ML-DSA-87 verify |
1377900 cycles |
1379633 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
237495 cycles |
229189 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
6faaac2 to
5b1b8a7
Compare
5b1b8a7 to
f9a6d30
Compare
mkannwischer
left a comment
There was a problem hiding this comment.
Thanks @willieyz. Performance is looking good and I checked that the code is doing the correct thing. Here are a few stylistic comments.
dev/x86_64/src/poly_caddq_avx2.S
Outdated
| .balign 16 | ||
| MLD_ASM_FN_SYMBOL(poly_caddq_avx2) | ||
|
|
||
| movabsq $35993616950222849, %rdx |
There was a problem hiding this comment.
Why are you using a 64-bit constant here? It would be much easier to follow if you take a 32-bit one:
mov $8380417, %edx
vmovd %edx, %xmm1
vpbroadcastd %xmm1, %ymm1
Unless that is slower, it should be prefered.
dev/x86_64/src/poly_caddq_avx2.S
Outdated
| addq $128, %rdi # advance by 128 bytes (4 vectors) | ||
| cmpq %rdi, %rax | ||
| jne poly_caddq_avx2_loop # 8 iterations (32/4 = 8) | ||
| vzeroupper |
There was a problem hiding this comment.
We never use vzeroupper in any other AVX2 files, so this should be eliminated.
dev/x86_64/src/poly_caddq_avx2.S
Outdated
| vpcmpgtd (%rdi), %ymm2, %ymm0 | ||
| vpand %ymm1, %ymm0, %ymm0 | ||
| vpaddd (%rdi), %ymm0, %ymm0 | ||
| vmovdqa %ymm0, (%rdi) |
There was a problem hiding this comment.
Please wrap this in a macro - similar to the caddq code for aarch64.
dev/x86_64/src/poly_caddq_avx2.S
Outdated
| vpaddd 96(%rdi), %ymm5, %ymm5 | ||
| vmovdqa %ymm5, 96(%rdi) | ||
|
|
||
| addq $128, %rdi # advance by 128 bytes (4 vectors) |
There was a problem hiding this comment.
We never use # comments.
Please use // comments or /* */ comments like in the other files. Also try to follow their style.
7dc5f6f to
6761759
Compare
This commit adds mld_poly_caddq to the benchmark components to evaluate the performance impact of replacing the caddq AVX2 intrinsics with x86_64 assembly code. Signed-off-by: willieyz <willie.zhao@chelpis.com>
This commit replace the eurrently caddq AVX2 implementation to x86_64 assembly code. Signed-off-by: willieyz <willie.zhao@chelpis.com>
Signed-off-by: willieyz <willie.zhao@chelpis.com>
6761759 to
14097e6
Compare
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 14097e6 | Previous: daf271b | Ratio |
|---|---|---|---|
ML-DSA-65 sign |
1398700 cycles |
1347085 cycles |
1.04 |
ML-DSA-87 keypair |
825204 cycles |
796551 cycles |
1.04 |
ML-DSA-87 sign |
1886968 cycles |
1773335 cycles |
1.06 |
ML-DSA-87 verify |
803609 cycles |
772360 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
poly_caddqwith assembly #491In this PR, we replace the AVX2 intrinsics implementation of
poly_caddqwith a x86_64 assembly version.To estimate the performance impact, we compare the results shown in the two tables below.
Overall, for keypair, sign, and verify (opt), the performance difference is below 1%, which is consistent with the no-opt case.
In the component-level benchmark for mld_poly_caddq, the observed performance differences are at least 17%. After unrolling the loop by a factor of 4, the differences are reduced to approximately 10%.