“…From processing and preparing the incoming audio [19,22,52] transcribing what was the user said [17,62], to understanding what the user meant [19] all have the goal of allowing users to speak naturally and fluently to a system in multiple complex contexts of use. From this point there is development focused on deciding what action the system should take as a result [1,23,24,41,60,61], what exactly the spoken response should be [10,26,27,31,45,48,57], and how that response should sound [10,14,32,46,56,66].…”