MEXCA is an easy-to-use tool researchers can use to capture emotion expressions from multiple modalities in videos. It overcomes limitations of previous software by combining existing unimodal systems to ultimately extract three modalities: Facial muscle movements, vocalizations, and speech content. Moreover, it can also track faces and speakers in the video. We describe the software's architecture and features and show how its output can be used to analyze emotion expressions in a Dutch election debate. We think MEXCA has great potential impact in the social sciences because it allows researchers to automatically capture emotion expressions on a large scale across individuals and contexts.