Event cameras are bio-inspired vision sensors that naturally capture the dynamics of a scene, filtering out redundant information. This paper presents a deep neural network approach that unlocks the potential of event cameras on a challenging motion-estimation task: prediction of a vehicle's steering angle. To make the best out of this sensor-algorithm combination, we adapt state-of-the-art convolutional architectures to the output of event sensors and extensively evaluate the performance of our approach on a publicly available large scale event-camera dataset (≈1000 km). We present qualitative and quantitative explanations of why event cameras allow robust steering prediction even in cases where traditional cameras fail, e.g. challenging illumination conditions and fast motion. Finally, we demonstrate the advantages of leveraging transfer learning from traditional to event-based vision, and show that our approach outperforms state-of-the-art algorithms based on standard cameras.
Research in stereoscopic 3D coding, transmission and subjective assessment methodology depends largely on the availability of source content that can be used in cross-lab evaluations. While several studies have already been presented using proprietary content, comparisons between the studies are difficult since discrepant contents are used. Therefore in this paper, a freely available dataset of high quality Full-HD stereoscopic sequences shot with a semiprofessional 3D camera is introduced in detail. The content was designed to be suited for usage in a wide variety of applications, including high quality studies. A set of depth maps was calculated from the stereoscopic pair. As an application example, a subjective assessment has been performed using coding and spatial degradations. The Absolute Category Rating with Hidden Reference method was used. The observers were instructed to vote on video quality only. Results of this experiment are also freely available and will be presented in this paper as a first step towards objective video quality measurement for 3DTV.
Abstract-Many existing engineering works model the statistical characteristics of the entities under study as normal distributions. These models are eventually used for decision making, requiring in practice the definition of the classification region corresponding to the desired confidence level. Surprisingly enough, however, a great amount of computer vision works using multidimensional normal models leave unspecified or fail to establish correct confidence regions due to misconceptions on the features of Gaussian functions or to wrong analogies with the unidimensional case. The resulting regions incur in deviations that can be unacceptable in high-dimensional models.Here we provide a comprehensive derivation of the optimal confidence regions for multivariate normal distributions of arbitrary dimensionality. To this end, firstly we derive the condition for region optimality of general continuous multidimensional distributions, and then we apply it to the widespread case of the normal probability density function. The obtained results are used to analyze the confidence error incurred by previous works related to vision research, showing that deviations caused by wrong regions may turn into unacceptable as dimensionality increases. To support the theoretical analysis, a quantitative example in the context of moving object detection by means of background modeling is given.
Abstract-Today's packet-switched networks are subject to bandwidth fluctuations that cause for degradation of user experience of multimedia services. In order to cope with this problem, HTTP Adaptive Streaming (HAS) has been proposed in recent years as a video delivery solution for the future Internet and being adopted by an increasing number of streaming services such as Netflix and Youtube. HAS enables service providers to improve users' Quality of Experience (QoE) and network resource utilization by adapting the quality of the video stream to the current network conditions. However, the resulting timevarying video quality caused by adaptation introduces a new type of impairment and thus novel QoE research challenges. Despite of various recent attempts to investigate these challenges, many fundamental questions regarding HAS perceptual performance are still open. In this paper, the QoE impact of different technical adaptation parameters including chunk length, switching amplitude, switching frequency and temporal recency are investigated. In addition, the influence of content on perceptual quality of these parameters is analyzed. To this end, a large number of adaptation scenarios have been subjectively evaluated in four laboratory experiments and one crowdsourcing study. Statistical analysis of the combined dataset reveals results that partly contradict widely held assumptions and provide novel insights in perceptual quality of adapted video sequences, e.g. interaction effects between quality switching direction (up/down) and switching strategy (smooth/abrupt). The large variety of experimental configurations across different studies ensures the consistency and external validity of the presented results that can be utilized for enhancing the perceptual performance of adaptive streaming services.
Recently, broadcasted 3D video content has reached households with the first generation of 3DTV. However, few studies have been done to analyze the Quality of Experience (QoE) perceived by the end-users in this scenario. This paper studies the impact of transmission errors in 3DTV, considering that the video is delivered in side-by-side format over a conventional packet-based network. For this purpose, a novel evaluation methodology based on standard single stimulus methods and with the aim of keeping as close as possible the home environment viewing conditions has been proposed. The effects of packet losses in monoscopic and stereoscopic videos are compared from the results of subjective assessment tests. Other aspects were also measured concerning 3D content as naturalness, sense of presence and visual fatigue. The results show that although the final perceived QoE is acceptable, some errors cause important binocular rivalry, and therefore, substantial visual discomfort.
all the results. This metric is called mean opinion score (MOS) and it is the basis of most of the objective video quality metrics, which try to model video quality in a way which correlates as much as possible with MOS [2]. These kinds of solutions, however, are normally quite costly in terms of computing power required, and require measuring the video quality in the pixel domain, typically both before and after the degradation. Thus they are widely used in video codec calibration, but very limitedly in network monitoring.Multimedia quality of service is typically characterized by the Media Delivery Index (MDI) [10], which is a de facto standard in IPTV deployments. MDI is composed of two measurements: the packet loss rate (PLR), and the delay factor (DF), a measure of packet jitter. It is quite useful to model network issues and effective packet loss, but it assumes that all
There has been a significant increase in the availability of 3D players and displays in the last years. Nonetheless, the amount of 3D content has not experimented an increment of such magnitude. To alleviate this problem, many algorithms for converting images and videos from 2D to 3D have been proposed. Here, we present an automatic learning-based 2D-3D image conversion approach, based on the key hypothesis that color images with similar structure likely present a similar depth structure. The presented algorithm estimates the depth of a color query image using the prior knowledge provided by a repository of color + depth images. The algorithm clusters this database attending to their structural similarity, and then creates a representative of each color-depth image cluster that will be used as prior depth map. The selection of the appropriate prior depth map corresponding to one given color query image is accomplished by comparing the structural similarity in the color domain between the query image and the database. The comparison is based on a K-Nearest Neighbor framework that uses a learning procedure to build an adaptive combination of image feature descriptors. The best correspondences determine the cluster, and in turn the associated prior depth map. Finally, this prior estimation is enhanced through a segmentation-guided filtering that obtains the final depth map estimation. This approach has been tested using two publicly available databases, and compared with several state-of-the-art algorithms in order to prove its efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.