This paper addresses the problem of speech emotion recognition from movie audio tracks. The recently collected Acted Facial Expression in the Wild 5.0 database is used. The aim is to discriminate among angry, happy, and neutral. We extract a relatively small number of features, a subset of which is not commonly used for the emotion recognition task. Those features are fed as input to an ensemble classifier that combines random forests with support vector machines. An accuracy of 65.63% is reported, outperforming a baseline system that uses the K-nearest neighbor classifier and has an accuracy of 56.88%. To verify the suitability of the exploited features, the same ensemble classification schema is applied on the feature set similar those employed in Audio/Visual Emotion Challenge 2011. In the latter case, an accuracy of 61.25% is achieved using a large set of 1582 features, as opposed to just 86 features in our case that lead to a relative improvement of 7.15% in accuracy.