Recognizing emotions of a user while interacting with smart devices like tablets and mobile phones is a prospective computer vision problem. They are used in a variety of applications like web browsing, multimedia content playing, gaming, etc., involving human interactions. We present an emotion recognition framework that analyze the facial expressions of a mobile phone user, under various real-world mobile data challenges like variations in lighting, head pose, expression, user/device movement, and computational complexity. The proposed system includes: (i) Personalized facial points tracking algorithm to suit mobile captured data; (ii) Temporal filter that pre-selects probable emotional frames from the input sequence for further processing, in-order to reduce the processing load; (iii) Face registration and operating region selection for compact facial action unit (AU) representation; (iv) Discriminative feature description of AUs that is robust to illumination changes and face angles; and (v) AU classification and intelligent mapping of the predicted AUs to target emotions. We compare the performance of the proposed ER system with the key state-of-the-art techniques and show a significant improvement on benchmark databases like CK+, ISL, FACS, JAFFE, MultiPie, MindReading, and also on our internally collected mobile phone data set.