Skip to content

PPC64 ASM: AES-ECB/CBC/CTR/GCM#9852

Open
SparkiDev wants to merge 1 commit intowolfSSL:masterfrom
SparkiDev:ppc64_asm_aes
Open

PPC64 ASM: AES-ECB/CBC/CTR/GCM#9852
SparkiDev wants to merge 1 commit intowolfSSL:masterfrom
SparkiDev:ppc64_asm_aes

Conversation

@SparkiDev
Copy link
Contributor

Description

To turn on assembly:
--enable-ppc64-asm
To build C code:
--enable-ppc64-asm=inline

To disable hardening (when physical access to device is not possible):
WOLFSSL_PPC64_ASM_AES_NO_HARDEN

AES-GCM works with either 4-bit (default) or table:
--enable-aesgcm=table
Using 'table' is faster for encryption/decryption.

Testing

./configure --disable-shared LDFLAGS=--static --host=powerpc64 CC=powerpc64-linux-gnu-gcc --enable-aesecb --enable-aescbc --enable-aesgcm=table --enable-aesctr CFLAGS=-DWOLFSSL_PPC64_ASM_AES_NO_HARDEN --enable-ppc64-asm
./configure --disable-shared LDFLAGS=--static --host=powerpc64 CC=powerpc64-linux-gnu-gcc --enable-aesecb --enable-aescbc --enable-aesgcm=table --enable-aesctr CFLAGS=-DWOLFSSL_PPC64_ASM_AES_NO_HARDEN --enable-ppc64-asm=inline
./configure --disable-shared LDFLAGS=--static --host=powerpc64 CC=powerpc64-linux-gnu-gcc --enable-aesecb --enable-aescbc --enable-aesgcm=table --enable-aesctr --enable-ppc64-asm
./configure --disable-shared LDFLAGS=--static --host=powerpc64 CC=powerpc64-linux-gnu-gcc --enable-aesecb --enable-aescbc --enable-aesgcm=table --enable-aesctr --enable-ppc64-asm=inline
./configure --disable-shared LDFLAGS=--static --host=powerpc64 CC=powerpc64-linux-gnu-gcc --enable-aesecb --enable-aescbc --enable-aesgcm=table --enable-aesctr

@SparkiDev SparkiDev self-assigned this Mar 3, 2026
@SparkiDev
Copy link
Contributor Author

PPC64 assembly code generated with PR:
https://github.com/wolfSSL/scripts/pull/556

@SparkiDev SparkiDev force-pushed the ppc64_asm_aes branch 2 times, most recently from fcf8f3e to b606231 Compare March 3, 2026 04:23
@SparkiDev
Copy link
Contributor Author

retest this please

@dgarske dgarske self-requested a review March 3, 2026 20:16
@dgarske
Copy link
Contributor

dgarske commented Mar 5, 2026

Initial benchmarks on an NXP T2080 (e6500) core with 1.8GHz core clock:

With PR 9852:

AES-256-GCM-enc          13 MiB took 1.000 seconds, 13.051 MiB/s
AES-256-GCM-dec          13 MiB took 1.001 seconds, 13.044 MiB/s

With master:

AES-256-GCM-enc          15 MiB took 1.000 seconds, 15.305 MiB/s
AES-256-GCM-dec          7 MiB took 1.001 seconds, 7.901 MiB/s

@dgarske
Copy link
Contributor

dgarske commented Mar 5, 2026

Initial benchmarks on an NXP T2080 (e6500) core with 1.8GHz core clock:

With PR 9852:

AES-256-GCM-enc          13 MiB took 1.000 seconds, 13.051 MiB/s
AES-256-GCM-dec          13 MiB took 1.001 seconds, 13.044 MiB/s

With master:

AES-256-GCM-enc          15 MiB took 1.000 seconds, 15.305 MiB/s
AES-256-GCM-dec          7 MiB took 1.001 seconds, 7.901 MiB/s

Oh I did not try with WOLFSSL_PPC64_ASM_AES_NO_HARDEN . I also had 4 bit table not --enable-aesgcm-table. Let me run a few more tests.

@dgarske
Copy link
Contributor

dgarske commented Mar 5, 2026

-O3, AES GCM Table, SHA256 C

Master:

AES-128-CBC-enc          63 MiB took 1.000 seconds, 63.275 MiB/s
AES-128-CBC-dec          65 MiB took 1.000 seconds, 65.966 MiB/s
AES-192-CBC-enc          55 MiB took 1.000 seconds, 55.034 MiB/s
AES-192-CBC-dec          57 MiB took 1.000 seconds, 57.055 MiB/s
AES-256-CBC-enc          48 MiB took 1.000 seconds, 48.796 MiB/s
AES-256-CBC-dec          50 MiB took 1.000 seconds, 50.359 MiB/s
AES-128-GCM-enc          16 MiB took 1.001 seconds, 16.871 MiB/s
AES-128-GCM-dec          8 MiB took 1.000 seconds, 8.488 MiB/s
AES-192-GCM-enc          16 MiB took 1.000 seconds, 16.227 MiB/s
AES-192-GCM-dec          8 MiB took 1.000 seconds, 8.318 MiB/s
AES-256-GCM-enc          15 MiB took 1.000 seconds, 15.618 MiB/s
AES-256-GCM-dec          8 MiB took 1.001 seconds, 8.145 MiB/s
AES-128-GCM-enc-no_AAD   17 MiB took 1.000 seconds, 17.073 MiB/s
AES-128-GCM-dec-no_AAD   8 MiB took 1.000 seconds, 8.537 MiB/s
AES-192-GCM-enc-no_AAD   16 MiB took 1.000 seconds, 16.392 MiB/s
AES-192-GCM-dec-no_AAD   8 MiB took 1.001 seconds, 8.365 MiB/s
AES-256-GCM-enc-no_AAD   15 MiB took 1.000 seconds, 15.786 MiB/s
AES-256-GCM-dec-no_AAD   8 MiB took 1.001 seconds, 8.190 MiB/s
GMAC Table               22 MiB took 1.000 seconds, 22.948 MiB/s
SHA-256                  79 MiB took 1.000 seconds, 79.707 MiB/s
SHA-384                  36 MiB took 1.000 seconds, 36.723 MiB/s
SHA-512                  36 MiB took 1.000 seconds, 36.743 MiB/s
SHA-512/224              36 MiB took 1.000 seconds, 36.761 MiB/s
SHA-512/256              36 MiB took 1.000 seconds, 36.757 MiB/s
HMAC-SHA256              79 MiB took 1.000 seconds, 79.020 MiB/s
HMAC-SHA384              36 MiB took 1.000 seconds, 36.194 MiB/s
HMAC-SHA512              36 MiB took 1.000 seconds, 36.188 MiB/s

PR 9852 with WOLFSSL_PPC64_ASM WOLFSSL_PPC64_ASM_INLINE WOLFSSL_PPC64_ASM_SMALL WOLFSSL_PPC64_ASM_AES_NO_HARDEN WOLFSSL_PPC32_ASM WOLFSSL_PPC32_ASM_INLINE WOLFSSL_PPC32_ASM_SMALL

ES-128-CBC-enc          69 MiB took 1.000 seconds, 69.060 MiB/s
AES-128-CBC-dec          73 MiB took 1.000 seconds, 73.363 MiB/s
AES-192-CBC-enc          59 MiB took 1.000 seconds, 59.358 MiB/s
AES-192-CBC-dec          62 MiB took 1.000 seconds, 62.510 MiB/s
AES-256-CBC-enc          52 MiB took 1.000 seconds, 52.017 MiB/s
AES-256-CBC-dec          54 MiB took 1.000 seconds, 54.347 MiB/s
AES-128-GCM-enc          17 MiB took 1.000 seconds, 17.891 MiB/s
AES-128-GCM-dec          17 MiB took 1.001 seconds, 17.922 MiB/s
AES-192-GCM-enc          17 MiB took 1.001 seconds, 17.143 MiB/s
AES-192-GCM-dec          17 MiB took 1.000 seconds, 17.179 MiB/s
AES-256-GCM-enc          16 MiB took 1.000 seconds, 16.479 MiB/s
AES-256-GCM-dec          16 MiB took 1.000 seconds, 16.512 MiB/s
AES-128-GCM-enc-no_AAD   18 MiB took 1.001 seconds, 18.092 MiB/s
AES-128-GCM-dec-no_AAD   18 MiB took 1.000 seconds, 18.130 MiB/s
AES-192-GCM-enc-no_AAD   17 MiB took 1.001 seconds, 17.334 MiB/s
AES-192-GCM-dec-no_AAD   17 MiB took 1.000 seconds, 17.369 MiB/s
AES-256-GCM-enc-no_AAD   16 MiB took 1.001 seconds, 16.654 MiB/s
AES-256-GCM-dec-no_AAD   16 MiB took 1.000 seconds, 16.687 MiB/s
GMAC Table               24 MiB took 1.000 seconds, 24.648 MiB/s
SHA-256                  67 MiB took 1.000 seconds, 67.083 MiB/s
SHA-384                  36 MiB took 1.000 seconds, 36.714 MiB/s
SHA-512                  36 MiB took 1.000 seconds, 36.720 MiB/s
SHA-512/224              36 MiB took 1.000 seconds, 36.668 MiB/s
SHA-512/256              36 MiB took 1.000 seconds, 36.671 MiB/s
HMAC-SHA256              66 MiB took 1.000 seconds, 66.476 MiB/s
HMAC-SHA384              36 MiB took 1.000 seconds, 36.123 MiB/s
HMAC-SHA512              36 MiB took 1.000 seconds, 36.122 MiB/s

PR 9852 with WOLFSSL_PPC64_ASM WOLFSSL_PPC64_ASM_INLINE WOLFSSL_PPC64_ASM_AES_NO_HARDEN WOLFSSL_PPC64_ASM_AES_NO_HARDEN WOLFSSL_PPC32_ASM WOLFSSL_PPC32_ASM_INLINE

AES-128-CBC-enc          69 MiB took 1.000 seconds, 69.025 MiB/s
AES-128-CBC-dec          73 MiB took 1.000 seconds, 73.354 MiB/s
AES-192-CBC-enc          59 MiB took 1.000 seconds, 59.333 MiB/s
AES-192-CBC-dec          62 MiB took 1.000 seconds, 62.503 MiB/s
AES-256-CBC-enc          52 MiB took 1.000 seconds, 52.133 MiB/s
AES-256-CBC-dec          54 MiB took 1.000 seconds, 54.351 MiB/s
AES-128-GCM-enc          17 MiB took 1.000 seconds, 17.882 MiB/s
AES-128-GCM-dec          17 MiB took 1.000 seconds, 17.914 MiB/s
AES-192-GCM-enc          17 MiB took 1.000 seconds, 17.146 MiB/s
AES-192-GCM-dec          17 MiB took 1.000 seconds, 17.175 MiB/s
AES-256-GCM-enc          16 MiB took 1.000 seconds, 16.467 MiB/s
AES-256-GCM-dec          16 MiB took 1.001 seconds, 16.510 MiB/s
AES-128-GCM-enc-no_AAD   18 MiB took 1.001 seconds, 18.094 MiB/s
AES-128-GCM-dec-no_AAD   18 MiB took 1.001 seconds, 18.119 MiB/s
AES-192-GCM-enc-no_AAD   17 MiB took 1.001 seconds, 17.339 MiB/s
AES-192-GCM-dec-no_AAD   17 MiB took 1.001 seconds, 17.363 MiB/s
AES-256-GCM-enc-no_AAD   16 MiB took 1.000 seconds, 16.641 MiB/s
AES-256-GCM-dec-no_AAD   16 MiB took 1.000 seconds, 16.684 MiB/s
GMAC Table               24 MiB took 1.000 seconds, 24.648 MiB/s
SHA-256                  70 MiB took 1.000 seconds, 70.384 MiB/s
SHA-384                  36 MiB took 1.000 seconds, 36.681 MiB/s
SHA-512                  36 MiB took 1.000 seconds, 36.669 MiB/s
SHA-512/224              36 MiB took 1.000 seconds, 36.719 MiB/s
SHA-512/256              36 MiB took 1.000 seconds, 36.720 MiB/s
HMAC-SHA256              69 MiB took 1.000 seconds, 69.747 MiB/s
HMAC-SHA384              36 MiB took 1.000 seconds, 36.155 MiB/s
HMAC-SHA512              36 MiB took 1.000 seconds, 36.159 MiB/s

dgarske
dgarske previously approved these changes Mar 5, 2026
Copy link
Contributor

@dgarske dgarske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks posted. Marking approved, but won't consider merge until you have a chance to evaluate results. I will also work on running on an e5500 core.

@Scottcjn
Copy link

Scottcjn commented Mar 9, 2026

Excellent — PPC64 ASM AES is something we have been wanting. We use TLS extensively for our RustChain blockchain attestation nodes and Ergo anchor transactions.

Available for testing:

  • IBM POWER8 S824 — ppc64 (big-endian), 16c/128t, 512GB RAM, GCC 10
  • Power Mac G5 — ppc64 big-endian, Dual 2.0GHz 970

Would be happy to benchmark AES-GCM and AES-CTR throughput on POWER8 before and after this PR. Let us know if test results from real hardware would be useful for review.

@SparkiDev
Copy link
Contributor Author

Hi David,

Please run the performance numbers with the latest version of the code.

Thanks!
Sean

@SparkiDev
Copy link
Contributor Author

SparkiDev commented Mar 9, 2026

Hi @Scottcjn,

I have implemented AES-ECB/CBC/CTR/GCM.
If you have time to generate the performance numbers for these modes on any available computers, it would be appreciated.
First though, I need to get the assembly code working on those machines.
Please let me know what compilation errors you see using this code.

Thanks,
Sean

@dgarske dgarske self-requested a review March 9, 2026 02:56
@SparkiDev
Copy link
Contributor Author

retest this please

@SparkiDev
Copy link
Contributor Author

retest this please

To turn on assembly:
  --enable-ppc64-asm
To build C code:
  --enable-ppc64-asm=inline

To disable hardening (when physical access to device is not possible):
  WOLFSSL_PPC64_ASM_AES_NO_HARDEN

AES-GCM works with either 4-bit (default) or table:
  --enable-aesgcm=table
Using 'table' is faster for encryption/decryption.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants