“…We contend that just as human speech production is highly variable and comes in many different "styles", which are continuously adapted by speakers given dynamically changing social (tutoring, chatting, arguing, counseling...), individual (hearing problems, attitude, level of distraction, motivation, familiarity), linguistic (frequency, predictability, suprisal, importance) or environmental settings (external noise, mutual visibility, ...) [11,12,13,14,15,16,17,18]. Due to this inherent contextual embedding, human speech production can never be "neutral" or "perfectly natural", and no speaking style therefore qualifies as a reference signal that a speech event of inherently less quality, e.g.…”