“…Although hardware implementation of the binary Montgomery modular multiplication is simple, but it is time-consuming operation. To improve the performance of Montgomery modular multiplication algorithm and architecture, several hardware implementation method and computational techniques have been developed that can be categories into four groups: using high-radix technique [11][12][13][14][15][16][17], using systolic array architecture [18][19][20], using carry-save addition architecture [11,16,21,22,23], and using scalable architecture [9,12,24,25,26,27].…”