Elliptic curve cryptography (ECC) is a branch of Public-Key cryptography that is widely accepted for secure data exchange in many resource-limited devices. This paper presents a novel hardware cryptographic processor for ECC over general prime field GF(p). It is optimized on circuit level by introducing new parallel modular multiplication algorithm with its efficient hardware architecture, which offers significant improvement over the previously used techniques. Subsequently, on the system level, it is optimized by exploiting available high degree of parallelism using projective coordinates by incorporating four parallel multiplier units. The proposed hardware is implemented on Xilinx Virtex-4 and Virtex-6 field programmable gate arrays. A 256-bit scalar multiplication is completed in 1.43 ms and 2.96 ms in a cycle count of 207.1K on Virtex-6 and Virtex-4 field programmable gate array paltforms, respectively. The Virtex-6 implementation attains a maximum frequency of 144 MHz, occupies 32.4K look-up-tables, whereas on Virtex-4 it is about 70 MHz with 35.7K slices. The results show that the proposed design offers a significant improvement in computation time with a significant reduction in cycle count as compared with the other reported designs. Therefore, it is a good choice to be used in many ECC-based schemes.FPGA, ELLIPTIC CURVE CRYPTOGRAPHY (ECC), MODULAR MULTIPLIER 215 public, while scalar d is private parameters. Mathematically, finding the value of d, while knowing the Q and P is known as elliptic curve discrete logarithm problem (ECDLP), which is the basis of mathematical security of all ECC cryptosystems. Because it is computationally hard to reverse the EC scalar multiplication operation provided that the involved parameters are chosen carefully. However, ECDLP can be bypassed by exploiting several algorithmic and implementation weaknesses termed as side channel attacks (SCA) [7]. SCA can be used to attack any physical implementation. For example, if one can have somehow access to a cryptographic device, then he may be able to reveal d by monitoring timing and power consumption profiles of the device. Simple and most common SCAs are based on timing and simple power analysis [8,9].Several hardware architectures have been developed to efficiently compute the EC scalar multiplication operation [10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. Among these, [10,11] are based on ECs and prime fields recommended by National Institute of Standards and Technology (NIST) [26], while all other designs support any general prime field GF(p). In [27,28] listed nearly all reported EC scalar multiplier hardware architectures. Typically, NIST-based designs are superior in terms of performance, however are less flexible to design over general GF(p). All these designs developed EC scalar multiplier architecture using standard EC Weierstrass representation.
ContributionThis paper presents a novel low latency flexible EC scalar multiplier architecture over GF(p). In addition to the low latency feature, the pr...