Abstract:Many applications, such as autonomous navigation, urban planning, and asset monitoring, rely on the availability of accurate information about objects and their geolocations. In this paper, we propose the automatic detection and computation of the coordinates of recurring stationary objects of interest using street view imagery. Our processing pipeline relies on two fully convolutional neural networks: the first segments objects in the images, while the second estimates their distance from the camera. To geolocate all the detected objects coherently we propose a novel custom Markov random field model to estimate the objects' geolocation. The novelty of the resulting pipeline is the combined use of monocular depth estimation and triangulation to enable automatic mapping of complex scenes with the simultaneous presence of multiple, visually similar objects of interest. We validate experimentally the effectiveness of our approach on two object classes: traffic lights and telegraph poles. The experiments report high object recall rates and position precision of approximately 2 m, which is approaching the precision of single-frequency GPS receivers.
The Standard Hough Transform is a popular method in image processing and is traditionally estimated using histograms. Densities modeled with histograms in high dimensional space and/or with few observations, can be very sparse and highly demanding in memory. In this paper, we propose first to extend the formulation to continuous kernel estimates. Second, when dependencies in between variables are well taken into account, the estimated density is also robust to noise and insensitive to the choice of the origin of the spatial coordinates. Finally, our new statistical framework is unsupervised (all needed parameters are automatically estimated) and flexible (priors can easily be attached to the observations). We show experimentally that our new modeling encodes better the alignment content of images.
This paper addresses the problem of floods classification and floods aftermath detection utilizing both social media and satellite imagery. Automatic detection of disasters such as floods is still a very challenging task. The focus lies on identifying passable routes or roads during floods. Two novel solutions are presented, which were developed for two corresponding tasks at the MediaEval 2018 benchmarking challenge. The tasks are (i) identification of images providing evidence for road passability and (ii) differentiation and detection of passable and non-passable roads in images from two complementary sources of information. For the first challenge, we mainly rely on object and scene-level features extracted through multiple deep models pre-trained on the ImageNet and Places datasets. The object and scene-level features are then combined using early, late and double fusion techniques. To identify whether or not it is possible for a vehicle to pass a road in satellite images, we rely on Convolutional Neural Networks and a transfer learning-based classification approach. The evaluation of the proposed methods are carried out on the largescale datasets provided for the benchmark competition. The results demonstrate significant improvement in the performance over the recent state-of-art approaches.
In this paper, we propose a multimodal approach to illicit content detection in videos. Distribution of pornographic material over computer networks has been taking place since the inception of the internet. Until recently, most of the research focuses on illicit content detection in images and text. Typically it involves robust skin detection, texture characterisation, shape modelling and keyword filtering. Video however provides the opportunity of exploiting supplementary features including audio and motion for additional confidence in the classification process. This work investigates the use of visual motion information and periodicity in the audio stream for illicit content detection in videos.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.