Introduction and Background This article takes its starting point in the joint understanding, of multimodal social semiotics and design-oriented didactics, that learning can be understood as a social, meaning-making process. This entails modes other than written and spoken language playing important roles in students' learning in school, even in language learning. The 'multimodal turn', in which attention is focused on the interplay between modes, opens up new ways of understanding the designs of classroom activity (Kress, 2003; Kress and van Leeuwen, 2006; Mills, 2010). These phenomena are not new, but our ways of thinking about them are changing or 'turning' (Jewitt, 2014a, pp. 3-4) towards giving attention to modes beyond verbal language. English as a school subject has a tradition of using visual modes in both first language teaching (Jewitt, 2014a) and second language teaching (Jakobsen, 2015; Skjelbred et al., 2017). Previous research has shown an increase in the use of images in textbooks (Bezemer and Kress, 2009). Furthermore, over recent decades, the written page has developed from a verbal to a visual unit (Baldry and Thibault, 2006; Bezemer and Kress, 2010). The visually organized one-spread layout in textbooks demands an active reader to create coherence and reading paths (Bezemer and Kress, 2009). English taught as a foreign language (EFL) or second language (L2/ESL) in Norway (where the two terms tend to be used interchangeably (see e.g. Røkenes, 2016)), has a long tradition of using multimodal resources and activities for learning, ranging from textbooks to film, music and drama (Maagerø and Simonsen, 2006; Scott and Ytreberg, 1990; Simensen, 2007). Multimodality is thus inherent in the English subject in Norway, though not an explicit part of the English subject curriculum. Over the past decade, multimodality as a concept has been gradually introduced into curricula in several countries, most notably in