HOL-Light/x86_64: Replace Keccakx4 intrinsics with AVX2 assembly and prove correct#989
HOL-Light/x86_64: Replace Keccakx4 intrinsics with AVX2 assembly and prove correct#989mkannwischer wants to merge 1 commit intomainfrom
Conversation
mkannwischer
commented
Mar 4, 2026
- Ports HOL-Light/x86_64: Replace Keccakx4 intrinsics with assembly and prove correct mlkem-native#1576
| @@ -0,0 +1,622 @@ | |||
| /* | |||
| * Copyright (c) The mldsa-native project authors | |||
There was a problem hiding this comment.
this needs the mlkem-native copyright too.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
45629 cycles |
35118 cycles |
1.30 |
ML-DSA-44 sign |
136855 cycles |
120041 cycles |
1.14 |
ML-DSA-44 verify |
51921 cycles |
38197 cycles |
1.36 |
ML-DSA-65 keypair |
80779 cycles |
60705 cycles |
1.33 |
ML-DSA-65 sign |
225103 cycles |
200568 cycles |
1.12 |
ML-DSA-65 verify |
79184 cycles |
62638 cycles |
1.26 |
ML-DSA-87 keypair |
129599 cycles |
93448 cycles |
1.39 |
ML-DSA-87 sign |
288556 cycles |
237996 cycles |
1.21 |
ML-DSA-87 verify |
128073 cycles |
95574 cycles |
1.34 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
45629 cycles |
35118 cycles |
1.30 |
ML-DSA-44 sign |
136855 cycles |
120041 cycles |
1.14 |
ML-DSA-44 verify |
51921 cycles |
38197 cycles |
1.36 |
ML-DSA-65 keypair |
80779 cycles |
60705 cycles |
1.33 |
ML-DSA-65 sign |
225103 cycles |
200568 cycles |
1.12 |
ML-DSA-65 verify |
79184 cycles |
62638 cycles |
1.26 |
ML-DSA-87 keypair |
129599 cycles |
93448 cycles |
1.39 |
ML-DSA-87 sign |
288556 cycles |
237996 cycles |
1.21 |
ML-DSA-87 verify |
128073 cycles |
95574 cycles |
1.34 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
93725 cycles |
93421 cycles |
1.00 |
ML-DSA-44 sign |
332483 cycles |
332489 cycles |
1.00 |
ML-DSA-44 verify |
99730 cycles |
99620 cycles |
1.00 |
ML-DSA-65 keypair |
160028 cycles |
159711 cycles |
1.00 |
ML-DSA-65 sign |
544630 cycles |
544428 cycles |
1.00 |
ML-DSA-65 verify |
160962 cycles |
160796 cycles |
1.00 |
ML-DSA-87 keypair |
267389 cycles |
266842 cycles |
1.00 |
ML-DSA-87 sign |
705838 cycles |
706051 cycles |
1.00 |
ML-DSA-87 verify |
270449 cycles |
269870 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113156 cycles |
113147 cycles |
1.00 |
ML-DSA-44 sign |
355307 cycles |
355180 cycles |
1.00 |
ML-DSA-44 verify |
117744 cycles |
117748 cycles |
1.00 |
ML-DSA-65 keypair |
196269 cycles |
196464 cycles |
1.00 |
ML-DSA-65 sign |
588527 cycles |
588331 cycles |
1.00 |
ML-DSA-65 verify |
194804 cycles |
194432 cycles |
1.00 |
ML-DSA-87 keypair |
321922 cycles |
322129 cycles |
1.00 |
ML-DSA-87 sign |
751533 cycles |
752572 cycles |
1.00 |
ML-DSA-87 verify |
320045 cycles |
319915 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
58725 cycles |
69132 cycles |
0.85 |
ML-DSA-44 sign |
166373 cycles |
188226 cycles |
0.88 |
ML-DSA-44 verify |
65653 cycles |
69219 cycles |
0.95 |
ML-DSA-65 keypair |
99835 cycles |
119330 cycles |
0.84 |
ML-DSA-65 sign |
270990 cycles |
300188 cycles |
0.90 |
ML-DSA-65 verify |
97621 cycles |
115292 cycles |
0.85 |
ML-DSA-87 keypair |
162011 cycles |
203812 cycles |
0.79 |
ML-DSA-87 sign |
343164 cycles |
395329 cycles |
0.87 |
ML-DSA-87 verify |
158967 cycles |
195762 cycles |
0.81 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
63924 cycles |
56534 cycles |
1.13 |
ML-DSA-44 sign |
192530 cycles |
181554 cycles |
1.06 |
ML-DSA-44 verify |
76491 cycles |
61421 cycles |
1.25 |
ML-DSA-65 keypair |
109756 cycles |
99035 cycles |
1.11 |
ML-DSA-65 sign |
318687 cycles |
298572 cycles |
1.07 |
ML-DSA-65 verify |
110003 cycles |
100520 cycles |
1.09 |
ML-DSA-87 keypair |
174490 cycles |
152827 cycles |
1.14 |
ML-DSA-87 sign |
409230 cycles |
355251 cycles |
1.15 |
ML-DSA-87 verify |
173818 cycles |
153945 cycles |
1.13 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
63924 cycles |
56534 cycles |
1.13 |
ML-DSA-44 sign |
192530 cycles |
181554 cycles |
1.06 |
ML-DSA-44 verify |
76491 cycles |
61421 cycles |
1.25 |
ML-DSA-65 keypair |
109756 cycles |
99035 cycles |
1.11 |
ML-DSA-65 sign |
318687 cycles |
298572 cycles |
1.07 |
ML-DSA-65 verify |
110003 cycles |
100520 cycles |
1.09 |
ML-DSA-87 keypair |
174490 cycles |
152827 cycles |
1.14 |
ML-DSA-87 sign |
409230 cycles |
355251 cycles |
1.15 |
ML-DSA-87 verify |
173818 cycles |
153945 cycles |
1.13 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68197 cycles |
68196 cycles |
1.00 |
ML-DSA-44 sign |
201921 cycles |
201973 cycles |
1.00 |
ML-DSA-44 verify |
70630 cycles |
70544 cycles |
1.00 |
ML-DSA-65 keypair |
121167 cycles |
121006 cycles |
1.00 |
ML-DSA-65 sign |
330997 cycles |
331382 cycles |
1.00 |
ML-DSA-65 verify |
117905 cycles |
118067 cycles |
1.00 |
ML-DSA-87 keypair |
198256 cycles |
198246 cycles |
1.00 |
ML-DSA-87 sign |
427019 cycles |
426898 cycles |
1.00 |
ML-DSA-87 verify |
194123 cycles |
194357 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
134581 cycles |
134725 cycles |
1.00 |
ML-DSA-44 sign |
524396 cycles |
524539 cycles |
1.00 |
ML-DSA-44 verify |
147264 cycles |
148068 cycles |
0.99 |
ML-DSA-65 keypair |
226274 cycles |
227491 cycles |
0.99 |
ML-DSA-65 sign |
860776 cycles |
864775 cycles |
1.00 |
ML-DSA-65 verify |
235167 cycles |
235581 cycles |
1.00 |
ML-DSA-87 keypair |
371123 cycles |
373600 cycles |
0.99 |
ML-DSA-87 sign |
1079312 cycles |
1084589 cycles |
1.00 |
ML-DSA-87 verify |
382843 cycles |
385427 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
49703 cycles |
40899 cycles |
1.22 |
ML-DSA-44 sign |
145345 cycles |
133167 cycles |
1.09 |
ML-DSA-44 verify |
55456 cycles |
43772 cycles |
1.27 |
ML-DSA-65 keypair |
84580 cycles |
71918 cycles |
1.18 |
ML-DSA-65 sign |
234603 cycles |
214333 cycles |
1.09 |
ML-DSA-65 verify |
84067 cycles |
72735 cycles |
1.16 |
ML-DSA-87 keypair |
132643 cycles |
108000 cycles |
1.23 |
ML-DSA-87 sign |
298105 cycles |
251840 cycles |
1.18 |
ML-DSA-87 verify |
133839 cycles |
110205 cycles |
1.21 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
49703 cycles |
40899 cycles |
1.22 |
ML-DSA-44 sign |
145345 cycles |
133167 cycles |
1.09 |
ML-DSA-44 verify |
55456 cycles |
43772 cycles |
1.27 |
ML-DSA-65 keypair |
84580 cycles |
71918 cycles |
1.18 |
ML-DSA-65 sign |
234603 cycles |
214333 cycles |
1.09 |
ML-DSA-65 verify |
84067 cycles |
72735 cycles |
1.16 |
ML-DSA-87 keypair |
132643 cycles |
108000 cycles |
1.23 |
ML-DSA-87 sign |
298105 cycles |
251840 cycles |
1.18 |
ML-DSA-87 verify |
133839 cycles |
110205 cycles |
1.21 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
157397 cycles |
157349 cycles |
1.00 |
ML-DSA-44 sign |
549527 cycles |
550739 cycles |
1.00 |
ML-DSA-44 verify |
168944 cycles |
169371 cycles |
1.00 |
ML-DSA-65 keypair |
267570 cycles |
268104 cycles |
1.00 |
ML-DSA-65 sign |
902641 cycles |
903355 cycles |
1.00 |
ML-DSA-65 verify |
273909 cycles |
274822 cycles |
1.00 |
ML-DSA-87 keypair |
448160 cycles |
448166 cycles |
1.00 |
ML-DSA-87 sign |
1156925 cycles |
1157655 cycles |
1.00 |
ML-DSA-87 verify |
456898 cycles |
458080 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212495 cycles |
212671 cycles |
1.00 |
ML-DSA-44 sign |
759753 cycles |
759391 cycles |
1.00 |
ML-DSA-44 verify |
228771 cycles |
228993 cycles |
1.00 |
ML-DSA-65 keypair |
379845 cycles |
380383 cycles |
1.00 |
ML-DSA-65 sign |
1251969 cycles |
1251394 cycles |
1.00 |
ML-DSA-65 verify |
371447 cycles |
372186 cycles |
1.00 |
ML-DSA-87 keypair |
604540 cycles |
605169 cycles |
1.00 |
ML-DSA-87 sign |
1593372 cycles |
1591399 cycles |
1.00 |
ML-DSA-87 verify |
618234 cycles |
617312 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128414 cycles |
128200 cycles |
1.00 |
ML-DSA-44 sign |
447242 cycles |
447698 cycles |
1.00 |
ML-DSA-44 verify |
141001 cycles |
144650 cycles |
0.97 |
ML-DSA-65 keypair |
220837 cycles |
220629 cycles |
1.00 |
ML-DSA-65 sign |
726461 cycles |
727485 cycles |
1.00 |
ML-DSA-65 verify |
222935 cycles |
223203 cycles |
1.00 |
ML-DSA-87 keypair |
364897 cycles |
365009 cycles |
1.00 |
ML-DSA-87 sign |
926304 cycles |
925883 cycles |
1.00 |
ML-DSA-87 verify |
372305 cycles |
372797 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
72361 cycles |
72238 cycles |
1.00 |
ML-DSA-44 sign |
212035 cycles |
211968 cycles |
1.00 |
ML-DSA-44 verify |
75684 cycles |
75586 cycles |
1.00 |
ML-DSA-65 keypair |
127590 cycles |
127524 cycles |
1.00 |
ML-DSA-65 sign |
350282 cycles |
350022 cycles |
1.00 |
ML-DSA-65 verify |
125320 cycles |
125427 cycles |
1.00 |
ML-DSA-87 keypair |
205650 cycles |
208263 cycles |
0.99 |
ML-DSA-87 sign |
443568 cycles |
448961 cycles |
0.99 |
ML-DSA-87 verify |
205413 cycles |
205274 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120485 cycles |
120079 cycles |
1.00 |
ML-DSA-44 sign |
445920 cycles |
445606 cycles |
1.00 |
ML-DSA-44 verify |
130152 cycles |
130275 cycles |
1.00 |
ML-DSA-65 keypair |
204352 cycles |
204060 cycles |
1.00 |
ML-DSA-65 sign |
727952 cycles |
727326 cycles |
1.00 |
ML-DSA-65 verify |
209414 cycles |
209046 cycles |
1.00 |
ML-DSA-87 keypair |
337846 cycles |
337824 cycles |
1.00 |
ML-DSA-87 sign |
922148 cycles |
922255 cycles |
1.00 |
ML-DSA-87 verify |
346380 cycles |
346248 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138798 cycles |
138472 cycles |
1.00 |
ML-DSA-44 sign |
483705 cycles |
483937 cycles |
1.00 |
ML-DSA-44 verify |
153226 cycles |
162299 cycles |
0.94 |
ML-DSA-65 keypair |
241550 cycles |
241394 cycles |
1.00 |
ML-DSA-65 sign |
791927 cycles |
792376 cycles |
1.00 |
ML-DSA-65 verify |
240894 cycles |
241234 cycles |
1.00 |
ML-DSA-87 keypair |
395843 cycles |
396580 cycles |
1.00 |
ML-DSA-87 sign |
1012272 cycles |
1012710 cycles |
1.00 |
ML-DSA-87 verify |
402006 cycles |
402628 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113294 cycles |
113500 cycles |
1.00 |
ML-DSA-44 sign |
355463 cycles |
355773 cycles |
1.00 |
ML-DSA-44 verify |
118011 cycles |
118295 cycles |
1.00 |
ML-DSA-65 keypair |
196529 cycles |
196483 cycles |
1.00 |
ML-DSA-65 sign |
588987 cycles |
588268 cycles |
1.00 |
ML-DSA-65 verify |
194713 cycles |
194737 cycles |
1.00 |
ML-DSA-87 keypair |
322687 cycles |
323067 cycles |
1.00 |
ML-DSA-87 sign |
751591 cycles |
753282 cycles |
1.00 |
ML-DSA-87 verify |
319983 cycles |
320275 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 9fd0d76 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212577 cycles |
212962 cycles |
1.00 |
ML-DSA-44 sign |
760091 cycles |
761109 cycles |
1.00 |
ML-DSA-44 verify |
229011 cycles |
234656 cycles |
0.98 |
ML-DSA-65 keypair |
380531 cycles |
380376 cycles |
1.00 |
ML-DSA-65 sign |
1253366 cycles |
1253515 cycles |
1.00 |
ML-DSA-65 verify |
371841 cycles |
371858 cycles |
1.00 |
ML-DSA-87 keypair |
604760 cycles |
604334 cycles |
1.00 |
ML-DSA-87 sign |
1594966 cycles |
1594506 cycles |
1.00 |
ML-DSA-87 verify |
618486 cycles |
618425 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
…prove correct - Ports pq-code-package/mlkem-native#1576 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>