Abstract-Computer vision, speech and machine learning technologies play an important role and are increasingly used in today's vehicles to improve the safety as well as comfort in the car. Driving in particular presents a context in which a user's emotional state plays a significant role. Emotions have been found to affect cognitive style and performance. Even mildly positive feeling can have a profound effect on the flexibility and efficiency of thinking and problem solving. In this paper, we review some of the existing approaches for analyzing invehicle driver affect using audio and visual cues. We will discuss challenges in developing robust system and hopefully provide some insight in practical realization of such system. In particular, we present our ongoing efforts in collecting driving data using simulator as well as real world driving testbeds, and propose to utilize a multilevel audio-visual fusion scheme to utilize contextual information often available in co-existing tasks in an intelligent system.