The camera tracking systems based on visual image processing face a problem that they are completely ineffective in their blind zones. To address this problem, a design of acoustic enhanced tracking system combining visual and auditory target tracking methods is reported in this paper. The system holds the abilities of performing sound direction estimation and target tracking in real-time. Estimating direction of arrival of the sound accompanied with the target helps the camera turn towards the target outside the field of view. This sound-triggered mode of camera operation makes a significant supplement to conventional cameras' working state. Considering the embedded system is necessary in consideration of the cost and size of the system in practical application, we designed a small aperture array with 7 digital omnidirectional MEMS microphones and built the overall system based on FPGA and ARM. The experiments were carried out in a normal indoor environment and the results confirmed that the system can perform auditory and visual tracking in real-time.INDEX TERMS visual target tracking, MEMS microphone array, acoustic localization, computer perception.