2014
DOI: 10.1007/978-3-662-43414-7_24
|View full text |Cite
|
Sign up to set email alerts
|

Montgomery Multiplication Using Vector Instructions

Abstract: Abstract. In this paper we present a parallel approach to compute interleaved Montgomery multiplication. This approach is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions. We have implemented this approach for tablet devices which run the x86 architecture (Intel Atom Z2760) using SSE2 instructions as well as devices which run on the ARM platform (Qualcomm MSM8960, NVID… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
41
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 24 publications
(41 citation statements)
references
References 28 publications
(37 reference statements)
0
41
0
Order By: Relevance
“…In [12], Pabbuleti et al implemented the NIST-recommended prime-field curve including P192 and P224 on the Snapdragon APQ8060 within 404, 405 clock cycles via applying multiplicand reduction method into SIMD-based machine. Recently, in SAC'13, a different approach to split the Montgomery multiplication into two parts, being computed in parallel, was introduced [6]. They flip the sign of the precomputed Montgomery constant and accumulate the result in two separate intermediate values that are computed concurrently while avoiding a redundant representation.…”
Section: Previous Workmentioning
confidence: 99%
See 4 more Smart Citations
“…In [12], Pabbuleti et al implemented the NIST-recommended prime-field curve including P192 and P224 on the Snapdragon APQ8060 within 404, 405 clock cycles via applying multiplicand reduction method into SIMD-based machine. Recently, in SAC'13, a different approach to split the Montgomery multiplication into two parts, being computed in parallel, was introduced [6]. They flip the sign of the precomputed Montgomery constant and accumulate the result in two separate intermediate values that are computed concurrently while avoiding a redundant representation.…”
Section: Previous Workmentioning
confidence: 99%
“…Firstly, we re-organized operands by conducting transpose operation, which can efficiently shuffle inner vector by 32-bit wise. Instead of a normal order ((B[0], B [1]), (B [2], B [3]), (B [4], B [5]), (B [6], B [7])), we actually classify the operand as groups ((B[0], B [4]), (B [2], B [6]), (B [1], B [5]), (B [3], B [7])) for computing multiplication where each operand ranges from 0 to 2 32 − 1(0xffff ffff in hexadecimal form). Secondly, multiplication [7])) where the results are located from 0 to 2 64 −2 33 +1(0xffff fffe 0000 0001).…”
Section: Cascade Operand Scanning Multiplication For Simdmentioning
confidence: 99%
See 3 more Smart Citations