In this paper, motion estimation is proposed by fusing audio and video sensor data. The audio systom consists of three microphones arranged on a Y-shaped structure, mounted on a pan-tilt camera. The camera forms the video system. Together, the audio and video system enables the 3D position of the sound source to be estimated. Using the position estimates, a motion model, consisting of the trenslational velocity and acceleration of the source, is in turn estimated using a, Kalnian Filter. The motion model allows the sound source to be tracked in real time. This fusion estiniation system has many pc+ tential applications such as video conferencing and security monitoring for intruders. Simulation results show that the motion estimation is satisfmtory.