Abstract-In video-based face recognition, a key challenge is in exploiting the extra information available in a video; e.g., face, body, and motion identity cues. In addition, different video sequences of the same subject may contain variations in resolution, illumination, pose, and facial expressions. These variations contribute to the challenges in designing an effective video-based face-recognition algorithm. We propose a novel multivariate sparse representation method for video-to-video face recognition. Our method simultaneously takes into account correlations as well as coupling information among the video frames. Our method jointly represents all the video data by a sparse linear combination of training data. In addition, we modify our model so that it is robust in the presence of noise and occlusion. Furthermore, we kernelize the algorithm to handle the non-linearities present in video data. Numerous experiments using unconstrained video sequences show that our method is effective and performs significantly better than many state-ofthe-art video-based face recognition algorithms in the literature.
I. INTRODUCTIONThough face recognition research [1] has traditionally concentrated on recognition from still images, recently, videobased face recognition has also gained a lot of traction. Faces are essentially articulating three dimensional objects. For faces, cues from motion possesses useful information in the form of behavioral traits such as idiosyncratic head movements and gestures, which can potentially aid in recognition tasks. Humans efficiently fuse face, body, and motion when recognizing people in video [2]. From video sequence effective representations such as three dimensional face models or super-resolved frames can be estimated. These techniques have the potential to improve recognition results.While the advantage of using motion information in face videos has been widely recognized, computational models for video-based face recognition have only recently gained attention. In this paper, we consider the problem of video-tovideo face recognition where one is presented with a video sequence and the goal is to recognize the person in the video. A key challenge is exploiting the extra information available in a video. In addition, different video sequences of the same subject may contain variations in resolution, illumination, pose, and facial expressions. These variations contribute to the difficulties in designing an effective video-based face recognition algorithm.