2020
DOI: 10.46586/tches.v2021.i1.1-24
|View full text |Cite
|
Sign up to set email alerts
|

Compact Dilithium Implementations on Cortex-M3 and Cortex-M4

Abstract: We present implementations of the lattice-based digital signature scheme Dilithium for ARM Cortex-M3 and ARM Cortex-M4. Dilithium is one of the three signature finalists of the NIST post-quantum cryptography competition. As our Cortex-M4 target, we use the popular STM32F407-DISCOVERY development board. Compared to the previous speed records on the Cortex-M4 by Ravi, Gupta, Chattopadhyay, and Bhasin we speed up the key operations NTT and NTT−1 by 20% which together with other optimizations results in speedups o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 31 publications
(27 citation statements)
references
References 12 publications
0
27
0
Order By: Relevance
“…The first reports higher throughput polynomial multiplication [40] and the second is a performance evaluation between several versions of the NTT, including iterative NTT, parallel NTT, and CUDA-based FFT (cuFFT) for different polynomial sizes [41]. Strictly algorithmic optimizations of the NTT are presented in other works [42][43]. Longa et al [42] show that limiting the coefficient length in polynomials to 32 bits yields an efficient modular reduction technique.…”
Section: Implementation Of Pqc Algorithms For Pkimentioning
confidence: 99%
See 1 more Smart Citation
“…The first reports higher throughput polynomial multiplication [40] and the second is a performance evaluation between several versions of the NTT, including iterative NTT, parallel NTT, and CUDA-based FFT (cuFFT) for different polynomial sizes [41]. Strictly algorithmic optimizations of the NTT are presented in other works [42][43]. Longa et al [42] show that limiting the coefficient length in polynomials to 32 bits yields an efficient modular reduction technique.…”
Section: Implementation Of Pqc Algorithms For Pkimentioning
confidence: 99%
“…Additionally, the authors use signed integer arithmetic which decreases the number of add operations necessary in both sampling and polynomial multiplication. Greconici et al [43] use signed integer arithmetic to decrease the number of add operations which leads to performance gains in several functions including NTT and SHAKE-128. The authors also employ a merging layers technique in NTT that reduces the number of loads and stores by about a factor of 2.…”
Section: Implementation Of Pqc Algorithms For Pkimentioning
confidence: 99%
“…The faster computation will be added in the eprint version. On Cortex-M3, our 16-bit and 32-bit butterflies are from [GKS21]. For solving CRT, we follow the AVX2 implementation in [CHK + 21].…”
Section: Ntts For Matrixvectormulmentioning
confidence: 99%
“…A 32-bit CT butterfly is to proceed with addsub of (a 0 , ba 1 ) [ACC + 21]. Although the 32-bit butterfly from [GKS21] gives the same functionality, we implement the 32-bit butterfly from [ACC + 21] for a smaller code size.…”
Section: -Bit Ct Butterfliesmentioning
confidence: 99%
See 1 more Smart Citation