IntroductionSquare root is an arithmetic operation which is widely used in various applications such as image and audio processing, scientific computation, computer graphics and digital communications [1-2-3-4]. Recently, there are many researches which implement the square root calculator in hardware like Field Programmable Gate Array (FPGA) in order to achieve high speed computation. The main interest of implementing square root calculation in hardware is to reduce the delays present in its computation, thus producing a very fast computation.In many VLSI applications nowadays, it is vital to create designs which are not only producing correct and accurate results, but also to provide designs with very high processing speed, where the execution delay is typically in the order of a few nanoseconds or even faster, like in the case of a computer graphic applications. However, in order to achieve the best performance from the system, certain criterias like the power consumption and the resources utilization need to be sacrificed.For instance, the computation speed in hardware can be increased by introducing techniques such as pipelining and parallel computing [5]. The latter will certainly speed up the process, however creating multiple parallel path for the same operation will cause the increase in the number of resources (i.e. Adders, Registers) used, and in consequence, the area or the size of the design will also become bigger.This case is certainly becoming an unnecessary issue which is encountered when developing medium-speed and low-speed applications, with the sampling or operating frequency is less than 10MHz, which do not required very high speed computation with very small delays, while still need to utilize the same amount of resources and thus occupying the same area in the hardware. For example, applications like Direct Torque Control (DTC) for machines, which implemented the square root calculation in FPGA to achieve better estimation of flux and torque, only operated at a minimum sampling period of 5 µs [6]. In this case, the redundancy which had been introduced by the parallel computation can be eliminated and therefore, optimizing the resources usage.Besides, other strategies that need be considered is the algorithm to be implemented in hardware, for square root calculator. Unlike other basic mathematical operations such as addition, substraction or multiplication, it is very difficult to implement a square root calculation in