SummaryUltrasound imaging has become one of the most widely used modalities in medical diagnosis today. However, real‐time ultrasound imaging requires large amount of data transfer and massive computation and therefore mainly relies on a complex dedicated hardware system. A recent trend of a graphics processing unit (GPU) based software‐based approach offers the advantages of flexibility and quick implementation. The GPUs have been reported as excellent accelerators across a wide range of applications. For best exploiting outstanding computational power and high memory bandwidth of a GPU, the paper explores the design space of implementing an ultrasound beamformer on a GPU platform. The design spaces are expanded by applying different optimization strategies to the beamformer on a GPU platform, and we also discuss the performance evaluation results on the various GPUs whose architectural characteristics are different to each others. The performance analysis shows that by optimizing CUDA code, our real‐time‐GPU‐based beamformer can be successfully implemented with 181 frames per second (fps) and speedup of 230.6X compared with the single‐threaded implementation on a high‐performance CPU platform. Copyright © 2014 John Wiley & Sons, Ltd.