Joint Video and Text Parsing for Understanding Events and Answering Queries

Tu, Kewei; Meng, Meng; Lee, Mun Wai; Choe, Tae Eun; Zhu, Song‐Chun

doi:10.1109/mmul.2014.29

Cited by 90 publications

(5 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have shown how the learning of scripts, which is central to many of the processes described, can capitalize on the current state of the art in computer vision -the learning of possible sequences of behavior observed in an environment and encoding of the knowledge in the form of AND-OR graphs (Gupta et al, 2009;Si et al, 2011;Pei et al, 2011;Tu et al, 2014) which can in turn be encoded as scripts. In addition, rapid effective causal learning lies at the heart of learning scripts rapidly (Ho, 2014;Ho, 2016a;Ho & Liausvia, 2013, 2014.…”

Section: Discussionmentioning

confidence: 99%

“…With the advent of computer vision and other sensing technologies, scripts can be learned through visual observation (or through other sensory modalities). Recently, there had been some work done in using computer vision to observe a scene filled with (human and other) activities and construct some kind of AND-OR graphs that capture the possible sequences of activities that can take place (Gupta, Srinivasan, Shi & Davis, 2009;Si, Pei, Yao & Zhu, 2011;Pei, Jia & Zhu, 2011;Tu, Meng, Lee, Choe & Zhu. 2014).…”

Section: Rapid Learning Of Problem Solving Scriptsmentioning

confidence: 99%

“…In the spirit of the AND-OR graph characterization of learned sequences of activities (Gupta et al, 2009;Si et al, 2011;Pei et al, 2011;Tu et al, 2014), in Figure 8 we show a part of the AND-OR graph that could have been learned from the home environment of Figure 1 using the computer vision techniques mentioned above. Each of these tracks of activities could be organized into a script like that of Figure 7.…”

Section: Rapid Learning Of Problem Solving Scriptsmentioning

confidence: 99%

“…This knowledge is not built-in -the robot has earlier learned through observation that these devices could alleviate boredom by observing the causal connection between the operating of the devices involved and the reactions of the human. And as mentioned above, this learning could take place with a small number of observational instances through visual observation (Gupta et al, 2009;Si et al, 2011;Pei et al, 2011;Tu et al, 2014).…”

Section: Engaging the Need Hierarchy At Various Levelsmentioning

confidence: 99%

See 3 more Smart Citations

A Principled Framework for General Adaptive Social Robotics

2016

International Journal of Artificial Life Research

View full text Add to dashboard Cite

A principled framework for general adaptive intelligent systems is described and applied to the domain of social robotics. Under the principled framework, the author develops computational methods to address an important aspect of a social robot, which is the ability to rapidly adapt to changes in the environment such as the introduction of novel objects and installations that serve novel purposes. Methods are also developed to address another important aspect of a social robot, which is the ability to understand the needs of humans that it interacts with by having a deep model of their needs, which enables the robot to assist humans in various tasks in a socially realistic manner. The author describes the methods of causal learning and script learning through computational visual observation that allow a robot to acquire the scripts and plans that enable it to understand the intentions of humans as well as solve problems to provide assistance to humans. The robot thus adapts rapidly to changing environmental factors as new observation provides new knowledge to guide its behavior. The assistance provided to humans is formulated as a script interaction problem and the optimal points at which assistance is provided are computed using a motivational strength model derived from psychological research and formulated computationally for robotic purposes. Also, a method is proposed to handle competition of needs which arises frequently in the course of robot-human interactions to generate socially realistic and appropriate behavior on the part of the robot. This paper uses primarily a home environment to demonstrate the methodology involved, but a robot that incorporates the methodology described could rapidly adapt to any environments such as the office and factory.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Rapid Learning Of Problem Solving Scriptsmentioning

confidence: 99%

Section: Rapid Learning Of Problem Solving Scriptsmentioning

confidence: 99%

Section: Engaging the Need Hierarchy At Various Levelsmentioning

confidence: 99%

See 2 more Smart Citations

A Principled Framework for General Adaptive Social Robotics

2016

International Journal of Artificial Life Research

View full text Add to dashboard Cite

show abstract

“…That is, its evaluation shouldn't be as hard as the task itself, and it must not be solvable using shortcuts or cheats. To solve these two problems we propose the task of visual question answering (VQA) (Antol et al 2015;Geman et al 2015;Malinowski and Fritz 2014;Tu et al 2014;Bigham et al 2010;Gao et al 2015). The task of VQA requires a machine to answer a natural language question about an image as shown in figure 2.…”

Section: Visual Question Answeringmentioning

confidence: 99%

Measuring Machine Intelligence Through Visual Question Answering

Zitnick

Agrawal²,

Antol³

et al. 2016

AI Magazine

View full text Add to dashboard Cite

As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence. A common approach is to propose tasks for which a human excels, but one which machines find difficult. However, an ideal task should also be easy to evaluate and not be easily gameable. We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is Visual Question Answering that tests a machine's ability to reason about language and vision. We describe a dataset unprecedented in size created for the task that contains over 760,000 human generated questions about images. Using around 10 million human generated answers, machines may be easily evaluated.

show abstract