Utilizing Visual Attention for Cross-Modal Coreference Interpretation

Byron, Donna; Mampilly, Thomas; Sharma, Vinay; Xu, Tianfang

doi:10.1007/11508373_7

Cited by 17 publications

(10 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Emerging technologies ranging from conversational agents that interact directly with humans on collaborative physical tasks [11], to VR and AR systems that attempt to comprehend spoken references to objects in the environment [8,28], to videomediated communication systems that track conversation and automatically adapt their views based on what the pairs need to see [26,29], would benefit from more advanced computational models of human referring behavior and an understanding of the ways in which context influences collaborative reference. Furthermore, new technologies that provide lightweight mobile eye tracking capabilities [7] are quickly becoming available as platforms for collaborative technologies that reside in everyday physical settings away from the desktop.…”

Section: Reference and Technologymentioning

confidence: 99%

See what i'm saying?

Gergle

Clark

2011

Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work

View full text Add to dashboard Cite

To create intelligent collaborative systems able to anticipate and react appropriately to users' needs and actions, it is crucial to develop a detailed understanding of the process of collaborative reference. We developed a dyadic eye tracking methodology and metrics for studying the multimodal process of reference, and applied these techniques in an experiment using a naturalistic conversation elicitation task. We found systematic differences in linguistic and visual coordination between pairs of mobile and seated participants. Our results detail measurable interactions between referential form, gaze, and spatial context and can be used to enable the development of more natural collaborative user interfaces.

show abstract

Section: Reference and Technologymentioning

confidence: 99%

See what i'm saying?

Gergle

Clark

2011

Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work

View full text Add to dashboard Cite

show abstract

“…Eye gaze has been explored in automated language understanding such as speech recognition [4,14], reference resolution [3,13], and recently for word acquisition [10,22]. Given speech paired with eye gaze information and video images, a translation model was used to acquire words by associating acoustic phone sequences with visual representations of objects and actions [22].…”

Section: Related Workmentioning

confidence: 99%

Integrating domain knowledge with user eye gaze in automated word acquisition for conversational interfaces

Chai

2010

Proceedings of the 2010 Workshop on Eye Gaze in Intelligent Human Machine Interaction

View full text Add to dashboard Cite

Most conversation systems tend to fail when unexpected words are encountered. To overcome this problem, conversational systems must be able to learn new words automatically during human machine conversation. Motivated by psycholinguistic findings on eye gaze and human language processing, we have developed several techniques to incorporate human eye gaze for automatic word acquisition in multimodal conversational systems. This paper presents our new results on integrating eye gaze and domain knowledge in word acquisition in a 3D virtual world application. Our results demonstrate that the incorporation of domain knowledge can potentially improve word acquisition performance.

show abstract

“…Eye gaze as a modality in multimodal interaction goes beyond the function of pointing. In different speech and eye gaze systems, eye gaze has been explored for the purpose of mutual disambiguation (Tanaka, 1999;Zhang, Imamiya, Go, & Mao, 2004), as a complement to the speech channel for reference resolution (Campana, Baldridge, Dowding, Hockey, Remington, & Stone, 2001;Kaur, Termaine, Huang, Wilder, Gacovski, Flippo, & Mantravadi, 2003;Prasov & Chai, 2008;Byron, Mampilly, Sharma, & Xu, 2005) and speech recognition (Cooke, 2006;Qu & Chai, 2007), and for managing human-computer dialogue (Qvarfordt & Zhai, 2005).…”

Section: Related Workmentioning

confidence: 99%

Context-based Word Acquisition for Situated Dialogue in a Virtual World

Qu¹,

Chai²

2010

jair

View full text Add to dashboard Cite

To tackle the vocabulary problem in conversational systems, previous work has applied unsupervised learning approaches on co-occurring speech and eye gaze during interaction to automatically acquire new words. Although these approaches have shown promise, several issues related to human language behavior and human-machine conversation have not been addressed. First, psycholinguistic studies have shown certain temporal regularities between human eye movement and language production. While these regularities can potentially guide the acquisition process, they have not been incorporated in the previous unsupervised approaches. Second, conversational systems generally have an existing knowledge base about the domain and vocabulary. While the existing knowledge can potentially help bootstrap and constrain the acquired new words, it has not been incorporated in the previous models. Third, eye gaze could serve different functions in human-machine conversation. Some gaze streams may not be closely coupled with speech stream, and thus are potentially detrimental to word acquisition. Automated recognition of closely-coupled speech-gaze streams based on conversation context is important. To address these issues, we developed new approaches that incorporate user language behavior, domain knowledge, and conversation context in word acquisition. We evaluated these approaches in the context of situated dialogue in a virtual world. Our experimental results have shown that incorporating the above three types of contextual information significantly improves word acquisition performance.

show abstract

Utilizing Visual Attention for Cross-Modal Coreference Interpretation

Cited by 17 publications

References 19 publications

See what i'm saying?

See what i'm saying?

Integrating domain knowledge with user eye gaze in automated word acquisition for conversational interfaces

Context-based Word Acquisition for Situated Dialogue in a Virtual World

Contact Info

Product

Resources

About