Keywords: Multiple instance learning, deep learning, video-based face recognition. Abstract. In many real-world video-based face recognition scenarios, videos are usually captured under unconstrained conditions. It is very challenging because of low face resolutions, varying head pose and complex lighting. To address this issue, we present a new method by formulating the video-based face recognition issue as a multi-instance learning (MIL) problem. Specifically, given a pair of videos, we generate a bag composed of all the frame pairs from the two videos. The bag is positive if the given pair of videos is from the same person, otherwise it is negative. In this way, the recognition task is formulated as a binary classification problem in MIL. Then we propose a novel MIL algorithm with deep instance selection (MILDIS), which maps each bag into a feature space defined by the selected instances via an instance similarity measure. Our work achieves the state-of-the-art performances on the real-world datasets YouTube Faces (YTF) according to the restricted protocol.