Emotion&speech-based human facial animation technique can be considered as a useful application in many artificial intelligent systems. Given a speech signal, the recognizer output a sequence of the phoneme and emotion pairs. Thereby, we calculate the sequence of viseme and expression pairs accordingly, which are subsequently transformed to a consistent and synchronous video describing facial animation. This article introduces a novel facial animation technique that can intelligently generates real human face animation videos by leveraging an emotional speech. More specifically, we first extract acoustic features sufficiently discriminative to the emotion and phoneme pairs by deploying a multi-label feature selector. And the corresponding sequence of phoneme and emotion pairs are computed. Then, we propose a low-rank active learning paradigm for discovering multiple key facial frames that can best represent the above phoneme and emotion pairs in the feature subspace. Theoretically, the designed active learning is highly tolerant to video frame noises. Subsequently, we associate each phoneme and emotion pair with a key facial frame, based on which the well-known morphing technique fits the associated key facial frames to a smooth animated facial video. It focuses on generating multiple transitional facial frames between pairwise temporally adjacent key frames. Experiments have demonstrated that the synthesized facial videos look real, smooth, and synchronous with different male/female speeches.