We developed NameIt, a system that associates faces and names in news videos. It processes information from the videos and can infer possible name candidates for a given face or locate a face in news videos by name. To accomplish this task, the system takes a multimodal video analysis approach: face sequence extraction and similarity evaluation from videos, name extraction from transcripts, and video-caption recognition.T he Name-It system 1,2 associates names and faces in news videos. Assume that we're watching a TV news program. When persons we don't know appear in the news video, we can eventually identify most of them by watching only the video. To do this, we detect faces from a news video, locate names in the sound track, and then associate each face to the correct name. For face-name association, we use as many hints as possible based on structure, context, and meaning of the news video. We don't need any additional knowledge such as newspapers containing descriptions of the persons or biographical dictionaries with pictures. Similarly, Name-It can associate faces in news videos with their right names without using an a priori face-name association set. In other words, Name-It extracts face-name correspondences only from news videos.Name-It takes a multimodal approach to accomplish this task. For example, it uses several information sources available from news videosimage sequences, transcripts, and video captions. Name-It detects face sequences from image sequences and extracts name candidates from transcripts. It's possible to obtain transcripts from audio tracks by using the proper speech recognition technique with an allowance for recognition errors. However, most news broadcasts in the US already have closed captions. (In the near future, the worldwide trend will be for broadcasts to feature closed captions.) Thus we use closed-caption texts as transcripts for news videos. In addition, we employ video-caption detection and recognition. We used "CNN Headline News" as our primary source of news for our experiments.Given image sequences, transcripts, and video captions as information sources, Name-It associates extracted faces with extracted name candidates using the correlation of their timing information and face similarity information. Video captions are also taken into account as supplementary information. To associate faces and names, Name-It integrates several advanced image processing and natural-language processing techniques-face sequence extraction and similarity evaluation from videos, name extraction from transcripts, and video-caption recognition. Although these technologies aren't always highly accurate, integrating these results will help the system achieve more accurate output.With respect to face-name association, the Piction system 3 works similarly to Name-It. Piction identifies faces within a given captioned newspaper photograph by extracting faces from the photograph and analyzing the caption to obtain geometric constraints among faces. The system then labels each face with a name. ...
For robots to coexist with humans in a social world like ours, it is crucial that they possess human-like social interaction skills. Programming a robot to possess such skills is a challenging task. In this paper, we propose a Multimodal Deep Q-Network (MDQN) to enable a robot to learn human-like interaction skills through a trial and error method. This paper aims to develop a robot that gathers data during its interaction with a human, and learns human interaction behavior from the high dimensional sensory information using end-to-end reinforcement learning. This paper demonstrates that the robot was able to learn basic interaction skills successfully, after 14 days of interacting with people.
Developing self-organized swarm systems capable of adapting to environmental changes as well as to dynamic situations is a complex challenge. An efficient labour division model, with the ability to regulate the distribution of work among swarm robots, is an important element of this kind of system. This paper extends the popular Response Threshold Model (RTM) and proposes a new Adaptive Response Threshold Model (ARTM). Experiments were carried out in simulation and in real-robot scenarios with the aim of studying the performance of this new adaptive model. Results presented in this paper verify that the extended approach improves on the adaptability of previous systems. For example, by reducing collision duration among robots in foraging missions, our approach helps small swarms of robots to adapt more efficiently to changing environments, thus increasing their self-sustainability (survival rate). Finally, we propose a minimal version of ARTM, which is derived from the conclusions obtained through real-robot and simulation results.
For a natural social human-robot interaction, it is essential for a robot to learn the human-like social skills. However, learning such skills is notoriously hard due to the limited availability of direct instructions from people to teach a robot. In this paper, we propose an intrinsically motivated reinforcement learning framework in which an agent gets the intrinsic motivation-based rewards through the action-conditional predictive model. By using the proposed method, the robot learned the social skills from the human-robot interaction experiences gathered in the real uncontrolled environments. The results indicate that the robot not only acquired human-like social skills but also took more human-like decisions, on a test dataset, than a robot which received direct rewards for the task achievement.
BackgroundTheoretical studies predict that Lévy walks maximizes the chance of encountering randomly distributed targets with a low density, but Brownian walks is favorable inside a patch of targets with high density. Recently, experimental data reports that some animals indeed show a Lévy and Brownian walk movement patterns when forage for foods in areas with low and high density. This paper presents a simple, Gaussian-noise utilizing computational model that can realize such behavior.Methodology/Principal FindingsWe extend Lévy walks model of one of the simplest creature, Escherichia coli, based on biological fluctuation framework. We build a simulation of a simple, generic animal to observe whether Lévy or Brownian walks will be performed properly depends on the target density, and investigate the emergent behavior in a commonly faced patchy environment where the density alternates.Conclusions/SignificanceBased on the model, animal behavior of choosing Lévy or Brownian walk movement patterns based on the target density is able to be generated, without changing the essence of the stochastic property in Escherichia coli physiological mechanism as explained by related researches. The emergent behavior and its benefits in a patchy environment are also discussed. The model provides a framework for further investigation on the role of internal noise in realizing adaptive and efficient foraging behavior.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.