This study examines the level of anxiety students experience during ESP university classes in response to three different modalities of digital classroom environments, i.e., virtual classrooms which require participation in varying degrees of engagement -by means of video, audio and text-based interaction. In a cross-sectional survey, a total of 184 ESP students at four different faculties completed a modified version of the Situational Communication Apprehension Measure (SCAM, McCroskey & Richmond, 1985) which aimed to determine the level of anxiety the students report feeling during video, audio and text-based synchronous online ESP classes. The main results indicate that the highest levels of anxiety were found with classroom contexts where students took part in lessons by means of a camera, with somewhat lower levels of anxiety found in contexts where students used the microphone to communicate with the language instructor and the other students. We propose that the main reasons behind these results lie in the overwhelming amount of visual and audio cues students are exposed to during online lessons, in particular the issues regarding gaze, the mirror effect, and dissonance of being physically present in one environment and mentally in another.