“…Viethen & Dale, 2007;van Deemter, Gatt, van der Sluis, & Power, 2012). Recent work in the context of a series of nlg shared tasks, in which participants are required to design algorithms that are developed and tested against a common dataset to enable comparison, has shown that results from these two perspectives may diverge significantly Belz, Kow, Viethen, & Gatt, 2010). For instance, an algorithm's choice of content for referential descriptions may be very similar to the choices humans make, as shown by its degree of match to corpus data, but this does not imply that the resulting description will be easily resolved by human listeners.…”