“…Recent predictive coding models of perception suggest that rather than passively categorizing the bottom-up signal, observers make active predictions about what they are likely to hear (and see), and that perception is based on the difference between these predictions and the bottom-up signal (Clark, 2013; Friston, 2005; Kumar et al, 2011; Rao & Ballard, 1999; see McMurray & Jongman, 2011 and Kleinschmidt & Jaeger, for applications to speech perception). Visual speech information could play a crucial role in such predictive processes (Arnal & Giraud, 2012; van Wassenhove, 2013) because in many cases, preparatory gestures (e.g., closing the lips before a word initial /b/, raising the tongue before a /d/) are visible before any acoustic signal is produced (Chandrasekaran, Trubanova, Stillittano, Caplier, & Ghazanfar, 2009; Schwartz & Savariaux, 2014). Thus, for the listener, the visual speech signal could set up predictions about what is about to be heard.…”