As a step towards enhancing users' perceived multimedia quality levels beyond the level offered by the classic audiovisual systems, the authors present the results of an experimental study which looked at user's perception of inter-stream synchronization between olfactory data (scent) and video (without relevant audio). The impact on user's quality of experience (by considering enjoyment, relevance and reality) comparing synchronous with asynchronous presentation of olfactory and video media is analyzed and discussed. The aim is to empirically define the temporal boundaries within which users perceive olfactory data and video to be synchronized. The key analysis compares the user detection and perception of synchronization error. State of the art works have investigated temporal boundaries for olfactory data with audiovisual media, but no works document the integration of olfactory data and video (with no related audio). The results of this work show that the temporal boundaries for olfactory and video only are significantly different from olfactory, video and audio. The authors conclude that the absence of contextual audio reduces considerably the acceptable temporal boundary between the scent and video. The results also indicate that olfaction before video is more noticeable to users than olfaction after video and that users are more tolerable of olfactory data after video rather than olfactory data before video. In addition the results show the presence of two main synchronization regions. This work is a step towards the definition of synchronization specifications for multimedia applications based on olfactory and video media.
The automatic detection of humans in aerial thermal imagery plays a significant role in various real-time applications, such as surveillance, search and rescue and border monitoring. Small target size, low resolution, occlusion, pose, and scale variations are the significant challenges in aerial thermal images that cause poor performance for various state-of-the-art object detection algorithms. Though many deep-learning-based object detection algorithms have shown impressive performance for generic object detection tasks, their ability to detect smaller objects in the aerial thermal images is analyzed through this study. This work carried out the performance evaluation of Faster R-CNN and single-shot multi-box detector (SSD) algorithms with different backbone networks to detect human targets in aerial view thermal images. For this purpose, two standard aerial thermal datasets having human objects of varying scale are considered with different backbone networks, such as ResNet50, Inception-v2, and MobileNet-v1. The evaluation results demonstrate that the Faster R-CNN model trained with the ResNet50 network architecture out-performed in terms of detection accuracy, with a mean average precision (mAP at 0.5 IoU) of 100% and 55.7% for the test data of the OSU thermal dataset and AAU PD T datasets, respectively. SSD with MobileNet-v1 achieved the highest detection speed of 44 frames per second (FPS) on the NVIDIA GeForce GTX 1080 GPU. Fine-tuning the anchor parameters of the Faster R-CNN ResNet50 and SSD Inception-v2 algorithms caused remarkable improvement in mAP by 10% and 3.5%, respectively, for the challenging AAU PD T dataset. The experimental results demonstrated the application of Faster R-CNN and SSD algorithms for human detection in aerial view thermal images, and the impact of varying backbone network and anchor parameters on the performance improvement of these algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.