Hough Transform is a widely used shapebased algorithm for object detection and localization [6], this technique can be generalized to parametric curves as circles. For a real time execution and embedded integration, several optimizations are necessary due to the large memory and computational requirements. This paper presents an efficient real-time pipelined architecture with a FPGA implementation of our Hough Transform for multi-circles detection. The computation of center candidates was improved. A three stages pipeline architecture was designed in order to reduce the processing latency and cadence. The architecture has been integrated into a Xilinx Zynq-7000 XC7Z020 containing a FPGA Artix-7. The global system uses 78.5 BRAMs, 153 DSP slices, 21638 LUTs. Our global system can support a maximum clock frequency of 128.89 MHz. We validate our architecture using a 125MHz clock frequency and we obtain a latency of 33.214 ms and an interval between two images of 16,607 ms for a 1920x1080 pixels image. According to our results, our architecture offer a throughput more than 4 times better than the faster state of the art architecture.