Sangpyo Kim scite author profile

Fully Homomorphic encryption (FHE) has been gaining in popularity as an emerging means of enabling an unlimited number of operations in an encrypted message without decryption. A major drawback of FHE is its high computational cost. Specifically, a bootstrapping step that refreshes the noise accumulated through consequent FHE operations on the ciphertext can even take minutes of time. This significantly limits the practical use of FHE in numerous real applications.By exploiting the massive parallelism available in FHE, we demonstrate the first instance of the implementation of a GPU for bootstrapping CKKS, one of the most promising FHE schemes supporting the arithmetic of approximate numbers. Through analyzing CKKS operations, we discover that the major performance bottleneck is their high main-memory bandwidth requirement, which is exacerbated by leveraging existing optimizations targeted to reduce the required computation. These observations motivate us to utilize memory-centric optimizations such as kernel fusion and reordering primary functions extensively.Our GPU implementation shows a 7.02× speedup for a single CKKS multiplication compared to the state-of-the-art GPU implementation and an amortized bootstrapping time of 0.423us per bit, which corresponds to a speedup of 257× over a single-threaded CPU implementation. By applying this to logistic regression model training, we achieved a 40.0× speedup compared to the previous 8-thread CPU implementation with the same data.

show abstract

Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs

Kim

Jung

Park

et al. 2020

View full text Add to dashboard Cite

Homomorphic encryption (HE) draws huge attention as it provides a way of privacy-preserving computations on encrypted messages. Number Theoretic Transform (NTT), a specialized form of Discrete Fourier Transform (DFT) in the finite field of integers, is the key algorithm that enables fast computation on encrypted ciphertexts in HE. Prior works have accelerated NTT and its inverse transformation on a popular parallel processing platform, GPU, by leveraging DFT optimization techniques. However, these GPU-based studies lack a comprehensive analysis of the primary differences between NTT and DFT or only consider small HE parameters that have tight constraints in the number of arithmetic operations that can be performed without decryption. In this paper, we analyze the algorithmic characteristics of NTT and DFT and assess the performance of NTT when we apply the optimizations that are commonly applicable to both DFT and NTT on modern GPUs. From the analysis, we identify that NTT suffers from severe main-memory bandwidth bottleneck on large HE parameter sets. To tackle the main-memory bandwidth issue, we propose a novel NTT-specific on-the-fly root generation scheme dubbed on-the-fly twiddling (OT). Compared to the baseline radix-2 NTT implementation, after applying all the optimizations, including OT, we achieve 4.2× speedup on a modern GPU.

show abstract

Accelerating Fully Homomorphic Encryption Through Architecture-Centric Analysis and Optimization

et al. 2021

View full text Add to dashboard Cite

Homomorphic Encryption (HE) draws a significant attention as a privacy-preserving way for cloud computing because it allows computation on encrypted messages called ciphertexts. Among numerous HE schemes proposed, HE for Arithmetic of Approximate Numbers (HEAAN) is rapidly gaining popularity across a wide range of applications as it supports messages that can tolerate approximate computation with no limit on the number of arithmetic operations applicable to the ciphertexts. A critical shortcoming of HE is the high computation complexity of ciphertext arithmetic; especially, HE multiplication (HE Mul) is more than 10,000 times slower than the corresponding multiplication between unencrypted messages. This leads to a large body of HE acceleration studies including ones exploiting FPGAs; however, those did not conduct a rigorous analysis of computational complexity and data access patterns of HE Mul. Moreover, the proposals mostly focused on designs with small parameter sizes, making it difficult to accurately estimate the performance of the HE accelerators in conducting a series of complex arithmetic operations. In this paper, we first describe how HE Mul of HEAAN is performed in a manner friendly to non-crypto experts. Then we conduct a disciplined analysis on its computational and memory-access characteristics, through which we (1) extract parallelism in the key functions composing HE Mul and (2) demonstrate how to effectively map the parallelism to the popular parallel processing platforms, CPUs and GPUs, by applying a series of optimizations such as transposing matrices and pinning data to threads. This leads to the performance improvement of HE Mul on a CPU and a GPU by 2.06× and 4.05×, respectively, over the reference HEAAN running on a CPU with 24 threads. INDEX TERMSComputer applications, Computer architecture, Cryptography, Multicore processing I. INTRODUCTION

show abstract

SHARP: A Short-Word Hierarchical Accelerator for Robust and Practical Fully Homomorphic Encryption

Kim

Ahn

et al. 2023

View full text Add to dashboard Cite

BTS

Kim

et al. 2022

View full text Add to dashboard Cite

Homomorphic encryption (HE) enables the secure offloading of computations to the cloud by providing computation on encrypted data (ciphertexts). HE is based on noisy encryption schemes in which noise accumulates as more computations are applied to the data. The limited number of operations applicable to the data prevents practical applications from exploiting HE. Bootstrapping enables an unlimited number of operations or fully HE (FHE) by refreshing the ciphertext. Unfortunately, bootstrapping requires a significant amount of additional computation and memory bandwidth as well. Prior works have proposed hardware accelerators for computation primitives of FHE. However, to the best of our knowledge, this is the first to propose a hardware FHE accelerator that supports bootstrapping as a first-class citizen.In particular, we propose BTS Ð Bootstrappable, Technologydriven, Secure accelerator architecture for FHE. We identify the challenges of supporting bootstrapping in the accelerator and analyze the off-chip memory bandwidth and computation required. In particular, given the limitations of modern memory technology, we identify the HE parameter sets that are efficient for FHE acceleration. Based on the insights gained from our analysis, we propose BTS, which effectively exploits the parallelism innate in HE operations by arranging a massive number of processing elements in a grid. We present the design and microarchitecture of BTS, including a network-on-chip design that exploits a deterministic communication pattern. BTS shows 5,556× and 1,306× improved execution time on ResNet-20 and logistic regression over a CPU, with a chip area of 373.6mm 2 and up to 163.2W of power.

show abstract

Accelerating Fully Homomorphic Encryption Through Microarchitecture-Aware Analysis and Optimization

Jung

Lee

Kim

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sangpyo Kim

Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs

Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs

Accelerating Fully Homomorphic Encryption Through Architecture-Centric Analysis and Optimization

SHARP: A Short-Word Hierarchical Accelerator for Robust and Practical Fully Homomorphic Encryption

BTS

Accelerating Fully Homomorphic Encryption Through Microarchitecture-Aware Analysis and Optimization

Contact Info

Product

Resources

About