This paper presents an approach for mining visual actions from real-world videos. Given a large number of movies, we want to automatically extract short video sequences corresponding to visual human actions. First, we find commonly occurring actions by mining verbs extracted from movie transcripts. Next, we align the transcripts with the videos using subtitles. We then retrieve video samples for each action of interest. Not all of these samples visually characterize the action. Therefore, we propose to rank the retrieved videos by visual consistency. We first explore two unsupervised outlier detection methods: one-class Support Vector Machines (SVM) and finding the densest component of a similarity graph. As an alternative, we show how to obtain and use weak supervision. We investigate a direct use of binary SVM and propose a novel iterative re-training scheme for Support Vector Regression machines (SVR). Experimental results explore actions in 144 episodes of the TV series Buffy the Vampire Slayer and show: (a) the applicability of our approach to a large set of real-world videos, (b) how to use visual consistency for ranking videos retrieved from text, (c) the added value of random nonaction samples, i.e., the importance of weak supervision and (d) the ability of our iterative SVR re-training algorithm to handle mistakes in the weak supervision. The quality of the rankings obtained is assessed on manually annotated data for six different action classes.