With the recent advances in quantum computing, code-based cryptography is foreseen to be one of the few mathematical solutions to design quantum resistant public-key cryptosystems. The binary polynomial multiplication dominates the computational time of the primitives in such cryptosystems, thus the design of efficient multipliers is crucial to optimize the performance of post-quantum public-key cryptographic solutions. This manuscript presents a flexible template architecture for the hardware implementation of large binary polynomial multipliers. The architecture combines the iterative application of the Karatsuba algorithm, to minimize the number of required partial products, with the Comba algorithm, used to optimize the schedule of their computations. In particular, the proposed multiplier architecture supports operands in the order of dozens of thousands of bits, and it offers a wide range of performance-resources trade-offs that is made independent from the size of the input operands. To demonstrate the effectiveness of our solution, we employed the nine configurations of the LEDAcrypt public-key cryptosystem as representative use cases for large-degree binary polynomial multiplications. For each configuration we showed that our template architecture can deliver a performance-optimized multiplier implementation for each FPGA of the Xilinx Artix-7 mid-range family. The experimental validation performed by implementing our multiplier for all the LEDAcrypt configurations on the Artix-7 12 and 200 FPGAs, i.e., the smallest and the largest devices of the Artix-7 family, demonstrated an average performance gain of 3.6x and 33.3x with respect to an optimized software implementation employing the gf2x C library.
Considering code-based cryptography, quasi-cyclic low-density parity-check (QC-LDPC) codes are foreseen as one of the few solutions to design post-quantum cryptosystems. The bit-flipping algorithm is at the core of the decoding procedure of such codes when used to design cryptosystems. An effective design must account for the computational complexity of the decoding and the code size required to ensure the security margin against attacks led by quantum computers. To this end, it is of paramount importance to deliver efficient and flexible hardware implementations to support quantum-resistant publickey cryptosystems, since available software solutions cannot cope with the required performance. This manuscript proposes an efficient and scalable architecture for the implementation of the bit-flipping procedure targeting large QC-LDPC codes for post-quantum cryptography. To demonstrate the effectiveness of our solution, we employed the nine configurations of the LEDAcrypt cryptosystem as representative use cases for QC-LDPC codes suitable for post-quantum cryptography. For each configuration, our template architecture can deliver a performance-optimized decoder implementation for all the FPGAs of the Xilinx Artix-7 mid-range family. The experimental results demonstrate that our optimized architecture allows the implementation of large QC-LDPC codes even on the smallest FPGA of the Xilinx Artix-7 family. Considering the implementation of our decoder on the Xilinx Artix-7 200 FPGA, the experimental results show an average performance speedup of 5 times across all the LEDAcrypt configurations, compared to the official optimized software implementation of the decoder that employs the Intel AVX2 extension. INDEX TERMS QC-LDPC codes, bit-flipping decoding, code-based cryptography, post-quantum cryptography, applied cryptography, FPGA, hardware design
Considering the energy-cap problem in batterypowered devices, DVFS and power gating represent the defacto state-of-the-art actuators. However, the limited margin available to reduce the operating voltage, the impossibility to massively integrate such actuators on-chip. together with their actuation latency force a revision of such design methodologies. We present an all-digital architecture and a design methodology that can effectively manage the energy-cap problem for CPUs and accelerators. Two quality metrics are put forward to capture the performance loss and the energy budget violations. We employed a vector processor supporting 4 hardware threads as representative usecase. Results show an average performance loss and energy cap violations limited to 2.9% and 3.8%, respectively. Compared to solutions employing the DFS actuator, our alldigital architecture improves the energy-cap violations by 3x while maintaining a similar performance loss.Index Terms-Energy-constrained design, low power, digital design, RTL design, multi-core, power management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.