Dishes
Optimized C++ Implementation
The optimized architecture-specific implementation supports x86-64 and aarch64 and uses ISA extensions to accelerate AES and other operations.
x86-64 (with AVX2 and AES-NI)
We measured the performance using a single core of a workstation running a AMD Zen 3 Ryzen 9 5950X processor at 3.4 GHz (with clock boosting disabled) and 128 GiB memory. The system was otherwise idle (load average 0.01), so while Simultaneous Multi-Threading was enabled it likely did not affect the results significantly. Each individual test can be run with memory usage below 19 MiB. The computer was running Linux 6.6.40, and the implementations were built with GCC 14.1.1.
| FAEST Variant
|
Runtimes | Sizes in Bytes | |||||||
|---|---|---|---|---|---|---|---|---|---|
| KeyGen | Sign | Verify | sk
|
pk
|
sig
|
||||
| ms | Mcyc | ms | Mcyc | ms | Mcyc | ||||
| 128s | 0.002 | 0.005 | 3.761 | 12.787 | 2.877 | 9.783 | 32 | 32 | 4506 |
| 128f | 0.002 | 0.005 | 0.507 | 1.722 | 0.415 | 1.413 | 32 | 32 | 5924 |
| 192s | 0.003 | 0.011 | 16.084 | 54.687 | 12.438 | 42.290 | 40 | 48 | 11260 |
| 192f | 0.003 | 0.011 | 2.072 | 7.045 | 1.788 | 6.079 | 40 | 48 | 14948 |
| 256s | 0.004 | 0.013 | 22.450 | 76.330 | 21.925 | 74.546 | 48 | 48 | 20696 |
| 256f | 0.004 | 0.013 | 3.256 | 11.071 | 3.012 | 10.241 | 48 | 48 | 26548 |
| EM-128s | 0.002 | 0.005 | 2.766 | 9.403 | 2.176 | 7.398 | 32 | 32 | 3906 |
| EM-128f | 0.002 | 0.005 | 0.413 | 1.404 | 0.327 | 1.113 | 32 | 32 | 5060 |
| EM-192s | 0.003 | 0.009 | 11.553 | 39.282 | 10.659 | 36.239 | 48 | 48 | 9340 |
| EM-192f | 0.003 | 0.009 | 1.523 | 5.177 | 1.372 | 4.665 | 48 | 48 | 12380 |
| EM-256s | 0.004 | 0.013 | 18.372 | 62.465 | 17.570 | 59.738 | 64 | 64 | 17984 |
| EM-256f | 0.004 | 0.013 | 2.775 | 9.436 | 2.566 | 8.725 | 64 | 64 | 23476 |
AArch64 (with AES)
For ARM, we benchmarked on a Macbook Pro with an Apple M1 processor at up to 3.2 GHz.
| FAEST Variant
|
Runtimes | Sizes in Bytes | |||||||
|---|---|---|---|---|---|---|---|---|---|
| KeyGen | Sign | Verify | sk
|
pk
|
sig
|
||||
| ms | Mcyc | ms | Mcyc | ms | Mcyc | ||||
| 128s | 0.169 | 0.540 | 4.740 | 15.169 | 3.718 | 11.898 | 32 | 32 | 4506 |
| 128f | 0.170 | 0.545 | 0.700 | 2.239 | 0.504 | 1.613 | 32 | 32 | 5924 |
| 192s | 0.148 | 0.474 | 18.116 | 57.971 | 11.747 | 37.590 | 40 | 48 | 11260 |
| 192f | 0.161 | 0.514 | 1.917 | 6.134 | 1.707 | 5.464 | 40 | 48 | 14948 |
| 256s | 0.158 | 0.507 | 23.107 | 73.942 | 21.350 | 68.321 | 48 | 48 | 20696 |
| 256f | 0.119 | 0.380 | 3.014 | 9.645 | 3.190 | 10.208 | 48 | 48 | 26548 |
| EM-128s | 0.168 | 0.537 | 3.855 | 12.336 | 2.675 | 8.560 | 32 | 32 | 3906 |
| EM-128f | 0.158 | 0.504 | 0.571 | 1.826 | 0.400 | 1.279 | 32 | 32 | 5060 |
| EM-192s | 0.160 | 0.511 | 11.333 | 36.267 | 9.957 | 31.862 | 48 | 48 | 9340 |
| EM-192f | 0.165 | 0.528 | 1.535 | 4.911 | 1.206 | 3.860 | 48 | 48 | 12380 |
| EM-256s | 0.167 | 0.535 | 18.654 | 59.692 | 16.334 | 52.270 | 64 | 64 | 17984 |
| EM-256f | 0.163 | 0.523 | 2.403 | 7.691 | 2.524 | 8.077 | 64 | 64 | 23476 |
Reference C Implementation
The reference implementation is slower than the optimized implementation above, but follows the algorithms given in the specification more closely.
Old Implementations
-
x86-64 C implementation with AVX2, AES-NI, and other ISA extensions for the NIST Round 1 submission. Superceded by the C++ version above.
-
Initial Rust implementation for our Crypto 2023 paper. Note that this is for an older version of our protocol, which uses different primitives and is incompatible with the specification.