Homomorphic encryption (HE) offers great capabilities that can solve a wide range of privacy-preserving computing problems. This tool allows anyone to process encrypted data producing encrypted results that only the decryption key’s owner can decrypt. Although HE has been realized in several public implementations, its performance is quite demanding. The reason for this is attributed to the huge amount of computation required by secure HE schemes. In this work, we present a CUDAbased implementation of the Fan and Vercauteren (FV) Somewhat HomomorphicEncryption (SHE) scheme. We demonstrate several algebraic tools such as the Chinese Remainder Theorem (CRT), Residual Number System (RNS) and Discrete Galois Transform (DGT) to accelerate and facilitate FV computation on GPUs. We also show how the entire FV computation can be done on GPU without multi-precision arithmetic. We compare our GPU implementation with two mature state-of-the-art implementations: 1) Microsoft SEAL v2.3.0-4 and 2) NFLlib-FV. Our implementation outperforms them and achieves on average 5.37x, 7.37x, 22.22x, 5.11x and 13.18x (resp. 2.03x, 2.94x, 27.86x, 8.53x and 18.69x) for key generation, encryption, decryption, homomorphic addition and homomorphic multiplication against SEAL-FVRNS (resp. NFLlib-FV).
Homomorphic encryption is an emerging form of encryption that provides the ability to compute on encrypted data without ever decrypting them. Potential applications include aggregating sensitive encrypted data on a cloud environment and computing on the data in the cloud without compromising data privacy. There have been several recent advances resulting in new homomorphic encryption schemes and optimized variants. We implement and evaluate the performance of two optimized variants, namely Bajard-Eynard-Hasan-Zucca (BEHZ) and Halevi-Polyakov-Shoup (HPS), of the most promising homomorphic encryption scheme in CPU and GPU. The most interesting (and also unexpected) result of our performance evaluation is that the HPS variant in practice scales significantly better (typically by 15%-30%) with increase in multiplicative depth of the computation circuit than BEHZ, implying that the HPS variant will always outperform BEHZ for most practical applications. For the multiplicative depth of 98, our fastest GPU implementation performs homomorphic multiplication in 51 ms for 128-bit security settings, which is faster by two orders of magnitude than prior results and already practical for cloud environments supporting GPU computations. Large multiplicative depths supported by our implementations are required for applications involving deep neural networks, logistic regression learning, and other important machine learning problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.