Abstract-Face detection is an important aspect for biometrics, video surveillance and human computer interaction. We present a multi-GPU implementation of the Viola-Jones face detection algorithm that meets the performance of the fastest known FPGA implementation. The GPU design offers far lower development costs, but the FPGA implementation consumes less power. We discuss the performance programming required to realize our design, and describe future research directions.