Validating the web-based evaluation of NLG systems

Koller, Alexander; Striegnitz, Kristina; Byron, Donna; Cassell, Justine; Dale, Robert; Dalzel-Job, Sara; Moore, Johanna D.; Oberlander, Jon

doi:10.3115/1667583.1667676

Cited by 3 publications

(2 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such effects have been found in previous studies; for example, Krahmer and Swerts (2004) found that Dutch and Italian subjects perceived the role of prosodically linked eyebrow movements differently. However, Koller et al (2009) have recently replicated in a lab-based study the results from an online study of a set of natural-language generation systems. This indicates that the results from Internet-based evaluation of generated output can be reliable despite the diverse subject pool; however, future lab-based experiments may be advisable to confirm this for this particular task.…”

Section: Discussionmentioning

confidence: 85%

User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system

Foster

Oberlander

2010

User Model User-Adap Inter

Self Cite

View full text Add to dashboard Cite

Tailoring the linguistic content of automatically generated descriptions to the preferences of a target user has been well demonstrated to be an effective way to produce higher-quality output that may even have a greater impact on user behaviour. It is known that the non-verbal behaviour of an embodied agent can have a significant effect on users' responses to content presented by that agent. However, to date no-one has examined the contribution of non-verbal behaviour to the effectiveness of user tailoring in automatically generated embodied output. We describe a series of experiments designed to address this question. We begin by introducing a multimodal dialogue system designed to generate descriptions and comparisons tailored to user preferences, and demonstrate that the user-preference tailoring is detectable to an overhearer when the output is presented as synthesised speech. We then present a multimodal corpus consisting of the annotated facial expressions used by a speaker to accompany the generated tailored descriptions, and verify that the most characteristic positive and negative expressions used by that speaker are identifiable when resynthesised on an artificial talking head. Finally, we combine the corpus-derived facial displays with the tailored descriptions to test whether the addition of the non-verbal channel improves users' ability to detect the intended tailoring, comparing two strategies for selecting the displays: one based on a simple corpus-derived rule, and one This article integrates and extends the work described in Foster and White (2005) and Foster (2007a,b). M. E. Foster (B)123 342 M. E. Foster, J. Oberlander making direct use of the full corpus data. The performance of the subjects who saw displays selected by the rule-based strategy was not significantly different than that of the subjects who got only the linguistic content, while the subjects who saw the data-driven displays were significantly worse at detecting the correctly tailored output. We propose a possible explanation for this result, and also make recommendations for developers of future systems that may make use of an embodied agent to present user-tailored content.

show abstract

Section: Discussionmentioning

confidence: 85%

User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system

Foster

Oberlander

2010

User Model User-Adap Inter

Self Cite

View full text Add to dashboard Cite

show abstract

“…Taxonomies [e.g., 1] and frameworks [e.g., 2] have been proposed often emphasizing the need to distinguish features of user, agent, and task. It is more common now for evaluation of ECAs, or component models such as natural language generation or text to speech (TTS) synthesis systems, to consist of both objective and subjective measures [2][3][4][5][6][7]. There are still instances, however, where data are collected in the absence of the manipulation of specific variables (comparison conditions) or without a control condition [e.g.…”

Section: Introductionmentioning

confidence: 99%

A Flexible Dual Task Paradigm for Evaluating an Embodied Conversational Agent: Modality Effects and Reaction Time as an Index of Cognitive Load

Stevens

Gibert

Leung

et al. 2011

Intelligent Virtual Agents

View full text Add to dashboard Cite

A new experimental method based on the dual task paradigm is used to evaluate speech intelligibility of an embodied conversational agent (ECA). The experiment consists of the manipulation of auditory-visual (AV) versus auditory-only (A) presentation of speech. In the dual task, participants perform two tasks concurrently. The secondary task is sensitive to cognitive processing demands of the primary task. In the primary task participants either shadowed words or named the superordinate categories to which words belonged, as the word items were spoken by the ECA under A or AV conditions. Reaction time (RT) on the secondary task-swatting a fly on the ECA face-was affected by the difficulty of the concurrent task. The secondary RT was affected by modality of presentation of the primary task. Using a relatively primitive ECA, RT on the secondary task was significantly slower when shadowing occurred in AV versus A conditions. The benefits of this evaluation system, that returns quantitative behavioural data and self-report ratings, are discussed.

show abstract

Building Virtual Guides for Virtual Worlds

Benotti

Denis

2012

Human-Computer Interaction, Tourism and Cultural Heritage

View full text Add to dashboard Cite

Validating the web-based evaluation of NLG systems

Cited by 3 publications

References 6 publications

User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system

User preferences can drive facial expressions: evaluating an embodied conversational agent in a recommender dialogue system

A Flexible Dual Task Paradigm for Evaluating an Embodied Conversational Agent: Modality Effects and Reaction Time as an Index of Cognitive Load

Building Virtual Guides for Virtual Worlds

Contact Info

Product

Resources

About