This paper presents an efficient architecture for circle detection using Hough transform. The architecture adopts the scanline-based ball detection algorithm for the edge detection stage and edge-flag algorithm for the voting process. To bolster the performance of the voting process, when drawing a circle, we divide it into 16 sub-parts and compute the parts in parallel. The proposed design employs an internal memory block for the edge list. Our simulation results show that the benchmark images in the VGA size of 640 x 480 are processed within 10ms, which indicates that the proposed architecture can satisfy the speed requirements of most real world applications.