We analyze the efficacy of a small crowd of naïve human raters in rating engagement during human-machine dialog interactions. Each rater viewed multiple 10 second, thin-slice videos of non-native English speakers interacting with a computerassisted language learning (CALL) system and rated how engaged and disengaged those callers were while interacting with the automated agent. We observe how the crowd's ratings compared to callers' self ratings of engagement, and further study how the distribution of these rating assignments vary as a function of whether the automated system or the caller was speaking. Finally, we discuss the potential applications and pitfalls of such a crowdsourced paradigm in designing, developing and analyzing engagement-aware dialog systems.