Optimized C++ Implementation

The optimized architecture-specific implementation supports x86-64 and aarch64 and uses ISA extensions to accelerate AES and other operations.

x86-64 (with AVX2 and AES-NI)

We measured the performance using a single core of a workstation running a AMD Zen 3 Ryzen 9 5950X processor at 3.4 GHz (with clock boosting disabled) and 128 GiB memory. The system was otherwise idle (load average 0.01), so while Simultaneous Multi-Threading was enabled it likely did not affect the results significantly. Each individual test can be run with memory usage below 19 MiB. The computer was running Linux 6.6.40, and the implementations were built with GCC 14.1.1.

FAEST Variant

Runtimes Sizes in Bytes
KeyGen Sign Verify sk
pk
sig
ms Mcyc ms Mcyc ms Mcyc
128s 0.002 0.005 3.761 12.787 2.877 9.783 32 32 4506
128f 0.002 0.005 0.507 1.722 0.415 1.413 32 32 5924
192s 0.003 0.011 16.084 54.687 12.438 42.290 40 48 11260
192f 0.003 0.011 2.072 7.045 1.788 6.079 40 48 14948
256s 0.004 0.013 22.450 76.330 21.925 74.546 48 48 20696
256f 0.004 0.013 3.256 11.071 3.012 10.241 48 48 26548
EM-128s 0.002 0.005 2.766 9.403 2.176 7.398 32 32 3906
EM-128f 0.002 0.005 0.413 1.404 0.327 1.113 32 32 5060
EM-192s 0.003 0.009 11.553 39.282 10.659 36.239 48 48 9340
EM-192f 0.003 0.009 1.523 5.177 1.372 4.665 48 48 12380
EM-256s 0.004 0.013 18.372 62.465 17.570 59.738 64 64 17984
EM-256f 0.004 0.013 2.775 9.436 2.566 8.725 64 64 23476

AArch64 (with AES)

For ARM, we benchmarked on a Macbook Pro with an Apple M1 processor at up to 3.2 GHz.

FAEST Variant

Runtimes Sizes in Bytes
KeyGen Sign Verify sk
pk
sig
ms Mcyc ms Mcyc ms Mcyc
128s 0.169 0.540 4.740 15.169 3.718 11.898 32 32 4506
128f 0.170 0.545 0.700 2.239 0.504 1.613 32 32 5924
192s 0.148 0.474 18.116 57.971 11.747 37.590 40 48 11260
192f 0.161 0.514 1.917 6.134 1.707 5.464 40 48 14948
256s 0.158 0.507 23.107 73.942 21.350 68.321 48 48 20696
256f 0.119 0.380 3.014 9.645 3.190 10.208 48 48 26548
EM-128s 0.168 0.537 3.855 12.336 2.675 8.560 32 32 3906
EM-128f 0.158 0.504 0.571 1.826 0.400 1.279 32 32 5060
EM-192s 0.160 0.511 11.333 36.267 9.957 31.862 48 48 9340
EM-192f 0.165 0.528 1.535 4.911 1.206 3.860 48 48 12380
EM-256s 0.167 0.535 18.654 59.692 16.334 52.270 64 64 17984
EM-256f 0.163 0.523 2.403 7.691 2.524 8.077 64 64 23476

Reference C Implementation

The reference implementation is slower than the optimized implementation above, but follows the algorithms given in the specification more closely.

Old Implementations

  • x86-64 C implementation with AVX2, AES-NI, and other ISA extensions for the NIST Round 1 submission. Superceded by the C++ version above.

  • Initial Rust implementation for our Crypto 2023 paper. Note that this is for an older version of our protocol, which uses different primitives and is incompatible with the specification.