Compact Dilithium Implementations on Cortex-M3 and Cortex-M4

Greconici, Denisa O. C.; Kannwischer, Matthias J.; Sprenkels, Daan

doi:10.46586/tches.v2021.i1.1-24

Cited by 31 publications

(27 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first reports higher throughput polynomial multiplication [40] and the second is a performance evaluation between several versions of the NTT, including iterative NTT, parallel NTT, and CUDA-based FFT (cuFFT) for different polynomial sizes [41]. Strictly algorithmic optimizations of the NTT are presented in other works [42][43]. Longa et al [42] show that limiting the coefficient length in polynomials to 32 bits yields an efficient modular reduction technique.…”

Section: Implementation Of Pqc Algorithms For Pkimentioning

confidence: 99%

“…Additionally, the authors use signed integer arithmetic which decreases the number of add operations necessary in both sampling and polynomial multiplication. Greconici et al [43] use signed integer arithmetic to decrease the number of add operations which leads to performance gains in several functions including NTT and SHAKE-128. The authors also employ a merging layers technique in NTT that reduces the number of loads and stores by about a factor of 2.…”

Section: Implementation Of Pqc Algorithms For Pkimentioning

confidence: 99%

See 1 more Smart Citation

Key Distribution for Post Quantum Cryptography using Physical Unclonable Functions

Cambou¹,

Gowanlock²,

Yıldız³

et al. 2021

Preprint

View full text Add to dashboard Cite

Lattice and code cryptography can replace existing schemes such as Elliptic Curve Cryptography because of their resistance to quantum computers. In support of public key infrastructures, the distribution, validation and storage of the cryptographic keys is then more complex to handle longer keys. This paper describes practical ways to generate keys from physical unclonable functions, for both lattice and code based cryptography. Handshakes between client devices containing the PUFs and a server are used to select sets of addressable positions in the PUFs, from which streams of bits called seeds are generated on demand. The public and private cryptographic key pairs are computed from these seeds together with additional streams of random numbers. The method allows the server to independently validate the public key generated by the PUF, and act as a certificate authority in the network. Technologies such as High performance computing, and graphic processing units can further enhance security by preventing attackers to make this independent validation when only equipped with less powerful computers.

show abstract

Section: Implementation Of Pqc Algorithms For Pkimentioning

confidence: 99%

Section: Implementation Of Pqc Algorithms For Pkimentioning

confidence: 99%

Key Distribution for Post Quantum Cryptography using Physical Unclonable Functions

Cambou¹,

Gowanlock²,

Yıldız³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The faster computation will be added in the eprint version. On Cortex-M3, our 16-bit and 32-bit butterflies are from [GKS21]. For solving CRT, we follow the AVX2 implementation in [CHK + 21].…”

Section: Ntts For Matrixvectormulmentioning

confidence: 99%

“…A 32-bit CT butterfly is to proceed with addsub of (a 0 , ba 1 ) [ACC + 21]. Although the 32-bit butterfly from [GKS21] gives the same functionality, we implement the 32-bit butterfly from [ACC + 21] for a smaller code size.…”

Section: -Bit Ct Butterfliesmentioning

confidence: 99%

“…Here we have two natural alternatives in NTT-based polynomial multiplication using only 16-bit multiplications. One can use 32-bit NTTs but emulate the long multiplications (used already to implement Dilithium which requires 32-bit NTTs [GKS21]). Or one can adopt the approach of the AVX2 implementation of [CHK + 21] and use two 16-bit NTTs which can be efficiently implemented while avoiding long multiplications.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-moduli NTTs for Saber on Cortex-M3 and Cortex-M4

Abdulrahman¹,

Chen

Chen³

et al. 2021

TCHES

Self Cite

View full text Add to dashboard Cite

The U.S. National Institute of Standards and Technology (NIST) has designated ARM microcontrollers as an important benchmarking platform for its Post-Quantum Cryptography standardization process (NISTPQC). In view of this, we explore the design space of the NISTPQC finalist Saber on the Cortex-M4 and its close relation, the Cortex-M3. In the process, we investigate various optimization strategies and memory-time tradeoffs for number-theoretic transforms (NTTs).Recent work by [Chung et al., TCHES 2021 (2)] has shown that NTT multiplication is superior compared to Toom–Cook multiplication for unprotected Saber implementations on the Cortex-M4 in terms of speed. However, it remains unclear if NTT multiplication can outperform Toom–Cook in masked implementations of Saber. Additionally, it is an open question if Saber with NTTs can outperform Toom–Cook in terms of stack usage. We answer both questions in the affirmative. Additionally, we present a Cortex-M3 implementation of Saber using NTTs outperforming an existing Toom–Cook implementation. Our stack-optimized unprotected M4 implementation uses around the same amount of stack as the most stack-optimized Toom–Cook implementation while being 33%-41% faster. Our speed-optimized masked M4 implementation is 16% faster than the fastest masked implementation using Toom–Cook. For the Cortex-M3, we outperform existing implementations by 29%-35% in speed. We conclude that for both stack- and speed-optimization purposes, one should base polynomial multiplications in Saber on the NTT rather than Toom–Cook for the Cortex-M4 and Cortex-M3. In particular, in many cases, multi-moduli NTTs perform best.

show abstract