Skip to content

HOL-Light/x86_64: Replace Keccakx4 intrinsics with AVX2 assembly and prove correct#989

Draft
mkannwischer wants to merge 1 commit intomainfrom
avx-keccak
Draft

HOL-Light/x86_64: Replace Keccakx4 intrinsics with AVX2 assembly and prove correct#989
mkannwischer wants to merge 1 commit intomainfrom
avx-keccak

Conversation

@mkannwischer
Copy link
Contributor

@@ -0,0 +1,622 @@
/*
* Copyright (c) The mldsa-native project authors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs the mlkem-native copyright too.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 45629 cycles 35118 cycles 1.30
ML-DSA-44 sign 136855 cycles 120041 cycles 1.14
ML-DSA-44 verify 51921 cycles 38197 cycles 1.36
ML-DSA-65 keypair 80779 cycles 60705 cycles 1.33
ML-DSA-65 sign 225103 cycles 200568 cycles 1.12
ML-DSA-65 verify 79184 cycles 62638 cycles 1.26
ML-DSA-87 keypair 129599 cycles 93448 cycles 1.39
ML-DSA-87 sign 288556 cycles 237996 cycles 1.21
ML-DSA-87 verify 128073 cycles 95574 cycles 1.34

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 45629 cycles 35118 cycles 1.30
ML-DSA-44 sign 136855 cycles 120041 cycles 1.14
ML-DSA-44 verify 51921 cycles 38197 cycles 1.36
ML-DSA-65 keypair 80779 cycles 60705 cycles 1.33
ML-DSA-65 sign 225103 cycles 200568 cycles 1.12
ML-DSA-65 verify 79184 cycles 62638 cycles 1.26
ML-DSA-87 keypair 129599 cycles 93448 cycles 1.39
ML-DSA-87 sign 288556 cycles 237996 cycles 1.21
ML-DSA-87 verify 128073 cycles 95574 cycles 1.34

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 93725 cycles 93421 cycles 1.00
ML-DSA-44 sign 332483 cycles 332489 cycles 1.00
ML-DSA-44 verify 99730 cycles 99620 cycles 1.00
ML-DSA-65 keypair 160028 cycles 159711 cycles 1.00
ML-DSA-65 sign 544630 cycles 544428 cycles 1.00
ML-DSA-65 verify 160962 cycles 160796 cycles 1.00
ML-DSA-87 keypair 267389 cycles 266842 cycles 1.00
ML-DSA-87 sign 705838 cycles 706051 cycles 1.00
ML-DSA-87 verify 270449 cycles 269870 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 113156 cycles 113147 cycles 1.00
ML-DSA-44 sign 355307 cycles 355180 cycles 1.00
ML-DSA-44 verify 117744 cycles 117748 cycles 1.00
ML-DSA-65 keypair 196269 cycles 196464 cycles 1.00
ML-DSA-65 sign 588527 cycles 588331 cycles 1.00
ML-DSA-65 verify 194804 cycles 194432 cycles 1.00
ML-DSA-87 keypair 321922 cycles 322129 cycles 1.00
ML-DSA-87 sign 751533 cycles 752572 cycles 1.00
ML-DSA-87 verify 320045 cycles 319915 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 58725 cycles 69132 cycles 0.85
ML-DSA-44 sign 166373 cycles 188226 cycles 0.88
ML-DSA-44 verify 65653 cycles 69219 cycles 0.95
ML-DSA-65 keypair 99835 cycles 119330 cycles 0.84
ML-DSA-65 sign 270990 cycles 300188 cycles 0.90
ML-DSA-65 verify 97621 cycles 115292 cycles 0.85
ML-DSA-87 keypair 162011 cycles 203812 cycles 0.79
ML-DSA-87 sign 343164 cycles 395329 cycles 0.87
ML-DSA-87 verify 158967 cycles 195762 cycles 0.81

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 63924 cycles 56534 cycles 1.13
ML-DSA-44 sign 192530 cycles 181554 cycles 1.06
ML-DSA-44 verify 76491 cycles 61421 cycles 1.25
ML-DSA-65 keypair 109756 cycles 99035 cycles 1.11
ML-DSA-65 sign 318687 cycles 298572 cycles 1.07
ML-DSA-65 verify 110003 cycles 100520 cycles 1.09
ML-DSA-87 keypair 174490 cycles 152827 cycles 1.14
ML-DSA-87 sign 409230 cycles 355251 cycles 1.15
ML-DSA-87 verify 173818 cycles 153945 cycles 1.13

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 63924 cycles 56534 cycles 1.13
ML-DSA-44 sign 192530 cycles 181554 cycles 1.06
ML-DSA-44 verify 76491 cycles 61421 cycles 1.25
ML-DSA-65 keypair 109756 cycles 99035 cycles 1.11
ML-DSA-65 sign 318687 cycles 298572 cycles 1.07
ML-DSA-65 verify 110003 cycles 100520 cycles 1.09
ML-DSA-87 keypair 174490 cycles 152827 cycles 1.14
ML-DSA-87 sign 409230 cycles 355251 cycles 1.15
ML-DSA-87 verify 173818 cycles 153945 cycles 1.13

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 68197 cycles 68196 cycles 1.00
ML-DSA-44 sign 201921 cycles 201973 cycles 1.00
ML-DSA-44 verify 70630 cycles 70544 cycles 1.00
ML-DSA-65 keypair 121167 cycles 121006 cycles 1.00
ML-DSA-65 sign 330997 cycles 331382 cycles 1.00
ML-DSA-65 verify 117905 cycles 118067 cycles 1.00
ML-DSA-87 keypair 198256 cycles 198246 cycles 1.00
ML-DSA-87 sign 427019 cycles 426898 cycles 1.00
ML-DSA-87 verify 194123 cycles 194357 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 134581 cycles 134725 cycles 1.00
ML-DSA-44 sign 524396 cycles 524539 cycles 1.00
ML-DSA-44 verify 147264 cycles 148068 cycles 0.99
ML-DSA-65 keypair 226274 cycles 227491 cycles 0.99
ML-DSA-65 sign 860776 cycles 864775 cycles 1.00
ML-DSA-65 verify 235167 cycles 235581 cycles 1.00
ML-DSA-87 keypair 371123 cycles 373600 cycles 0.99
ML-DSA-87 sign 1079312 cycles 1084589 cycles 1.00
ML-DSA-87 verify 382843 cycles 385427 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 49703 cycles 40899 cycles 1.22
ML-DSA-44 sign 145345 cycles 133167 cycles 1.09
ML-DSA-44 verify 55456 cycles 43772 cycles 1.27
ML-DSA-65 keypair 84580 cycles 71918 cycles 1.18
ML-DSA-65 sign 234603 cycles 214333 cycles 1.09
ML-DSA-65 verify 84067 cycles 72735 cycles 1.16
ML-DSA-87 keypair 132643 cycles 108000 cycles 1.23
ML-DSA-87 sign 298105 cycles 251840 cycles 1.18
ML-DSA-87 verify 133839 cycles 110205 cycles 1.21

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 49703 cycles 40899 cycles 1.22
ML-DSA-44 sign 145345 cycles 133167 cycles 1.09
ML-DSA-44 verify 55456 cycles 43772 cycles 1.27
ML-DSA-65 keypair 84580 cycles 71918 cycles 1.18
ML-DSA-65 sign 234603 cycles 214333 cycles 1.09
ML-DSA-65 verify 84067 cycles 72735 cycles 1.16
ML-DSA-87 keypair 132643 cycles 108000 cycles 1.23
ML-DSA-87 sign 298105 cycles 251840 cycles 1.18
ML-DSA-87 verify 133839 cycles 110205 cycles 1.21

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 157397 cycles 157349 cycles 1.00
ML-DSA-44 sign 549527 cycles 550739 cycles 1.00
ML-DSA-44 verify 168944 cycles 169371 cycles 1.00
ML-DSA-65 keypair 267570 cycles 268104 cycles 1.00
ML-DSA-65 sign 902641 cycles 903355 cycles 1.00
ML-DSA-65 verify 273909 cycles 274822 cycles 1.00
ML-DSA-87 keypair 448160 cycles 448166 cycles 1.00
ML-DSA-87 sign 1156925 cycles 1157655 cycles 1.00
ML-DSA-87 verify 456898 cycles 458080 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 212495 cycles 212671 cycles 1.00
ML-DSA-44 sign 759753 cycles 759391 cycles 1.00
ML-DSA-44 verify 228771 cycles 228993 cycles 1.00
ML-DSA-65 keypair 379845 cycles 380383 cycles 1.00
ML-DSA-65 sign 1251969 cycles 1251394 cycles 1.00
ML-DSA-65 verify 371447 cycles 372186 cycles 1.00
ML-DSA-87 keypair 604540 cycles 605169 cycles 1.00
ML-DSA-87 sign 1593372 cycles 1591399 cycles 1.00
ML-DSA-87 verify 618234 cycles 617312 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 128414 cycles 128200 cycles 1.00
ML-DSA-44 sign 447242 cycles 447698 cycles 1.00
ML-DSA-44 verify 141001 cycles 144650 cycles 0.97
ML-DSA-65 keypair 220837 cycles 220629 cycles 1.00
ML-DSA-65 sign 726461 cycles 727485 cycles 1.00
ML-DSA-65 verify 222935 cycles 223203 cycles 1.00
ML-DSA-87 keypair 364897 cycles 365009 cycles 1.00
ML-DSA-87 sign 926304 cycles 925883 cycles 1.00
ML-DSA-87 verify 372305 cycles 372797 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 72361 cycles 72238 cycles 1.00
ML-DSA-44 sign 212035 cycles 211968 cycles 1.00
ML-DSA-44 verify 75684 cycles 75586 cycles 1.00
ML-DSA-65 keypair 127590 cycles 127524 cycles 1.00
ML-DSA-65 sign 350282 cycles 350022 cycles 1.00
ML-DSA-65 verify 125320 cycles 125427 cycles 1.00
ML-DSA-87 keypair 205650 cycles 208263 cycles 0.99
ML-DSA-87 sign 443568 cycles 448961 cycles 0.99
ML-DSA-87 verify 205413 cycles 205274 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 120485 cycles 120079 cycles 1.00
ML-DSA-44 sign 445920 cycles 445606 cycles 1.00
ML-DSA-44 verify 130152 cycles 130275 cycles 1.00
ML-DSA-65 keypair 204352 cycles 204060 cycles 1.00
ML-DSA-65 sign 727952 cycles 727326 cycles 1.00
ML-DSA-65 verify 209414 cycles 209046 cycles 1.00
ML-DSA-87 keypair 337846 cycles 337824 cycles 1.00
ML-DSA-87 sign 922148 cycles 922255 cycles 1.00
ML-DSA-87 verify 346380 cycles 346248 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 138798 cycles 138472 cycles 1.00
ML-DSA-44 sign 483705 cycles 483937 cycles 1.00
ML-DSA-44 verify 153226 cycles 162299 cycles 0.94
ML-DSA-65 keypair 241550 cycles 241394 cycles 1.00
ML-DSA-65 sign 791927 cycles 792376 cycles 1.00
ML-DSA-65 verify 240894 cycles 241234 cycles 1.00
ML-DSA-87 keypair 395843 cycles 396580 cycles 1.00
ML-DSA-87 sign 1012272 cycles 1012710 cycles 1.00
ML-DSA-87 verify 402006 cycles 402628 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 113294 cycles 113500 cycles 1.00
ML-DSA-44 sign 355463 cycles 355773 cycles 1.00
ML-DSA-44 verify 118011 cycles 118295 cycles 1.00
ML-DSA-65 keypair 196529 cycles 196483 cycles 1.00
ML-DSA-65 sign 588987 cycles 588268 cycles 1.00
ML-DSA-65 verify 194713 cycles 194737 cycles 1.00
ML-DSA-87 keypair 322687 cycles 323067 cycles 1.00
ML-DSA-87 sign 751591 cycles 753282 cycles 1.00
ML-DSA-87 verify 319983 cycles 320275 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 9fd0d76 Previous: 339e496 Ratio
ML-DSA-44 keypair 212577 cycles 212962 cycles 1.00
ML-DSA-44 sign 760091 cycles 761109 cycles 1.00
ML-DSA-44 verify 229011 cycles 234656 cycles 0.98
ML-DSA-65 keypair 380531 cycles 380376 cycles 1.00
ML-DSA-65 sign 1253366 cycles 1253515 cycles 1.00
ML-DSA-65 verify 371841 cycles 371858 cycles 1.00
ML-DSA-87 keypair 604760 cycles 604334 cycles 1.00
ML-DSA-87 sign 1594966 cycles 1594506 cycles 1.00
ML-DSA-87 verify 618486 cycles 618425 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

…prove correct

- Ports pq-code-package/mlkem-native#1576

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants