Fast Implementation of Curve25519 Using AVX2

Faz-Hernández, Armando; López, Julio

doi:10.1007/978-3-319-22174-8_18

Cited by 19 publications

(9 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the former case, c is absorbed by one of the additions in Step 15; if this does not happen, then the later case arises and the carry is absorbed by the addition in Step 17. This shows that the algorithm terminates without any overflow and at the end of the algorithm we have 0 ≤ h (14), (19) and 21 As in the case of reduceSLMP, for the correctness of reduceSLPMP, it is not required to have η = 64. The value of η = 64 is used for 64-bit implementation and the algorithm can equally well be used with η-bit arithmetic for any value of η (say η = 32 or η = 128).…”

Section: Algorithm 4 Reduction For Saturated Limb Representation Permentioning

confidence: 89%

“…From (13), (14), (15) and 16, we have h (2) (θ) ≡ h (0) (θ) mod p and h (2) (θ) has a (κ, η, ν + 1)-representation.…”

Section: Algorithm 4 Reduction For Saturated Limb Representation Permentioning

confidence: 99%

“…For each reduction algorithm, we state precise theorems about their correctness and provide detailed proofs of correctness. Works on implementation of Curve25519 provide reduction methods for the prime 2 255 − 19 [3,9,13,14], though without proofs of correctness. We note that there are excellent discussions on Barrett and Montgomery reductions available in the literature [10,24,15] which also describe reduction methods for specific primes.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Efficient arithmetic in (pseudo-)mersenne prime order fields

Nath¹,

Sarkar²

2022

AMC

View full text Add to dashboard Cite

Elliptic curve cryptography is based upon elliptic curves defined over finite fields. Operations over such elliptic curves require arithmetic over the underlying field. In particular, fast implementations of multiplication and squaring over the finite field are required for performing efficient elliptic curve cryptography. The present work considers the problem of obtaining efficient algorithms for field multiplication and squaring. From a theoretical point of view, we present a number of algorithms for multiplication/squaring and reduction which are appropriate for different settings. Our algorithms collect together and generalize ideas which are scattered across various papers and codes. At the same time, we also introduce new ideas to improve upon existing works. A key theoretical feature of our work is that we provide formal statements and detailed proofs of correctness of the different reduction algorithms that we describe. On the implementation aspect, a total of fourteen primes are considered, covering all previously proposed cryptographically relevant (pseudo-)Mersenne prime order fields at various security levels. For each of these fields, we provide 64-bit assembly implementations of the relevant multiplication and squaring algorithms targeted towards two different modern Intel architectures. We were able to find previous 64-bit implementations for six of the fourteen primes considered in this work. On the Haswell and Skylake processors of Intel, for all the six primes where previous implementations are available, our implementations outperform such previous implementations.

show abstract

Section: Algorithm 4 Reduction For Saturated Limb Representation Permentioning

confidence: 89%

“…From (13), (14), (15) and 16, we have h (2) (θ) ≡ h (0) (θ) mod p and h (2) (θ) has a (κ, η, ν + 1)-representation.…”

Section: Algorithm 4 Reduction For Saturated Limb Representation Permentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Efficient arithmetic in (pseudo-)mersenne prime order fields

Nath¹,

Sarkar²

2022

AMC

View full text Add to dashboard Cite

show abstract

“…For example, the so-called ladder-step operation of the Montgomery ladder for Montgomery curves [14] can be implemented in a 2-way or 4-way parallel fashion so that two or four field operations are carried out in parallel, as described in e.g. [6,Algorithm 1] and [12,Fig. 1] for AVX2.…”

Section: Introductionmentioning

confidence: 99%

High-Throughput Elliptic Curve Cryptography Using AVX2 Vector Instructions

Cheng

Großschädl

Tian

et al. 2021

Selected Areas in Cryptography

View full text Add to dashboard Cite

Single-Instruction-Multiple-Data (SIMD) extensions like Intel's AVX2 offer a great potential to accelerate elliptic curve cryptography compared to a straightforward implementation using only base x64 instructions. All existing AVX2 implementations of scalar multiplication on Curve25519 and alternative elliptic curves are optimized for low latency. We argue in this paper that many applications, most notably server-side TLS handshake processing, would benefit more from throughput-optimized implementations than latency-optimized ones. To support this argument we introduce throughput-optimized AVX2 implementations of variable-base scalar multiplication on Curve25519 and fixed-base scalar multiplication on Ed25519. Both implementations perform four scalar multiplications in parallel, whereby each scalar multiplication uses a 64-bit element of a 256-bit AVX2 vector. The field arithmetic is based on a radix-2 29 representation of the field elements, which makes it possible to execute four parallel multiplications modulo a multiple of p = 2 255 − 19 in just 88 Skylake cycles. Four variable-base scalar multiplications on Curve25519 require less than 250,000 Skylake cycles, which translates into a throughput of 32,318 scalar multiplications per second at a clock frequency of 2 GHz. For comparison, the currently best latency-optimized AVX2 implementation reaches a throughput of only about 21,000 scalar multiplications per second on the same Skylake processor.

show abstract

“…Faz-Hernández and López [22] utilized efficient arithmetic operations on the prime field using AVX2, with performance benchmarked on the Intel Haswell processor. Faz-Hernández and López [23] proposed an efficient implementation of an elliptic curve (Curve25519) using AVX2. They proposed an accelerated prime field and elliptic curve arithmetic using AVX2.…”

Section: Related Work On Cryptographic Algorithmmentioning

confidence: 99%

Secure Data Encryption for Cloud-Based Human Care Services

et al. 2018

View full text Add to dashboard Cite

Sensor network services utilize sensor data from low-end IoT devices of the types widely deployed over long distances. After the collection of sensor data, the data is delivered to the cloud server, which processes it to extract useful information. Given that the data may contain sensitive and private information, it should be encrypted and exchanged through the network to ensure integrity and confidentiality. Under these circumstances, a cloud server should provide high-speed data encryption without a loss of availability. In this paper, we propose efficient parallel implementations of Simeck family block ciphers on modern 64-bit Intel processors. In order to accelerate the performance, an adaptive encryption technique is also exploited for load balancing of the resulting big data. Finally, the proposed implementations achieved 3.5 cycles/byte and 4.6 cycles/byte for Simeck32/64 and Simeck64/128 encryption, respectively.

show abstract

Fast Implementation of Curve25519 Using AVX2

Cited by 19 publications

References 5 publications

Efficient arithmetic in (pseudo-)mersenne prime order fields

Efficient arithmetic in (pseudo-)mersenne prime order fields

High-Throughput Elliptic Curve Cryptography Using AVX2 Vector Instructions

Secure Data Encryption for Cloud-Based Human Care Services

Contact Info

Product

Resources

About