SUMMARYWe humans are easily able to instantaneously detect the regions in a visual scene that are most likely to contain something of interest. Exploiting this pre-selection mechanism called visual attention for image and video processing systems would make them more sophisticated and therefore more useful. This paper briefly describes various computational models of human visual attention and their development, as well as related psychophysical findings. In particular, our objective is to carefully distinguish several types of studies related to human visual attention and saliency as a measure of attentiveness, and to provide a taxonomy from several viewpoints such as the main objective, the use of additional cues and mathematical principles. This survey finally discusses possible future directions for research into human visual attention and saliency computation. key words : human visual attention, computational model, saliency, bottom-up, top-down MotivationDeveloping sophisticated algorithms for detecting and recognizing something like objects from a given image and video has been a long distance challenge in pattern recognition and computer vision research fields. In fact, a huge number of studies, techniques and theories related to object detection and recognition have already been developed. In particular, several methods for detecting certain specific categories of objects such as human bodies and human faces have already been put to practical use in for example surveillance, authentication and the human-centric enhancement of image quality, with the best possible use of the prior knowledge of target objects (human bodies and faces) [1], [2]. However, generic object detection and recognition without any constraints as regards the target objects has remained major challenge, because (1) various kinds of objects might constitute the targets and (2) target objects in the same category might have different appearances due to variations of instances in a specific category, illumination changes and so on. † † The author is with the Graduate School of Informatics, Kyoto University, Kyoto-shi, 606-8501 Japan.† † † The author is with the Graduate School of Information Science, Nagoya University, Nagoya-shi, 464-8603 Japan.a) E-mail: akisato@ieee.org b) E-mail: yonetani@vision.kuee.kyoto-u.ac.jp c) E-mail: hirayama@is.nagoya-u.ac.jp DOI: 10.1587/transinf.E96.D.562 On the other hand, human beings seem to be able to detect various kinds of objects without any thought or effort. For example, from Fig. 1 left, we can easily and instantly detect a red car, a blue traffic sign and a broad white line. Visual attention [3] is considered to play an important role in achieving this function. Visual attention is one of the built-in mechanisms of the human visual system that quickly selects regions in a visual scene, which are most likely to contain items of interest. Such a pre-selection mechanism focusing only on relevant data would be essential in enabling computers to undertake subsequent processing such as generic o...
Numerous applications such as autonomous driving, satellite imagery sensing, and biomedical imaging use computer vision as an important tool for perception tasks. For Intelligent Transportation Systems (ITS), it is required to precisely recognize and locate scenes in sensor data. Semantic segmentation is one of computer vision methods intended to perform such tasks. However, the existing semantic segmentation tasks label each pixel with a single object's class. Recognizing object attributes, e.g., pedestrian orientation, will be more informative and help for a better scene understanding. Thus, we propose a method to perform semantic segmentation with pedestrian attribute recognition simultaneously. We introduce an attribute-aware loss function that can be applied to an arbitrary base model. Furthermore, a re-annotation to the existing Cityscapes dataset enriches the ground-truth labels by annotating the attributes of pedestrian orientation. We implement the proposed method and compare the experimental results with others. The attribute-aware semantic segmentation shows the ability to outperform baseline methods both in the traditional object segmentation task and the expanded attribute detection task.
SUMMARYPeople are being inundated under enormous volumes of information and they often dither about making the right choices from these. Interactive user support by information service system such as concierge services will effectively assist such people. However, humanmachine interaction still lacks naturalness and thoughtfulness despite the widespread utilization of intelligent systems. The system needs to estimate user's interest to improve the interaction and support the choices. We propose a novel approach to estimating the interest, which is based on the relationship between the dynamics of user's eye movements, i.e., the endogenous control mode of saccades, and machine's proactive presentations of visual contents. Under a specially-designed presentation phase to make the user express the endogenous saccades, we analyzed the timing structures between the saccades and the presentation events. We defined resistance as a novel time-delay feature representing the duration a user's gaze remains fixed on the previously presented content regardless of the next event. In experimental results obtained from 10 subjects, we confirmed that resistance is a good indicator for estimating the interest of most subjects (75% success in 28 experiments on 7 subjects). This demonstrated a higher accuracy than conventional estimates of interest based on gaze duration or frequency.
Semantic segmentation is an interesting task to many deep learning researchers for scene understanding. However, recognizing details about object's attributes can be more informative and also helpful for a better scene understanding in intelligent vehicle uses. This paper introduces a method for simultaneous semantic segmentation and pedestrian attributes recognition. A modified dataset built on top of the Cityscapes dataset is created by adding attribute classes corresponding to pedestrian orientation attributes. The proposed method extends the SegNet model and is trained by using both the original and the attribute-enriched datasets. Based on an experiment, the proposed attribute-aware semantic segmentation approach shows the ability to slightly improve the performance on the Cityscapes dataset, which is capable of expanding its classes in this case through additional data training.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.