In this paper, we present an integrated system for smart encoding in video surveillance. This system, developed within the European IST WCAM project, aims at defining an optimized JPEG 2000 codestream organization directly based on the semantic content of the video surveillance analysis module. The proposed system produces a fully compliant Motion JPEG 2000 stream that contains regions of interest (typically mobile objects) data in a separate layer than regions of less interest (e.g. static background). First the system performs a real-time unsupervised segmentation of mobiles in each frame of the video. The smart encoding module uses these regions of interest maps in order to construct a Motion JPEG 2000 codestream that allows an optimized rendering of the video surveillance stream in low bandwidth wireless applications, allocating more quality to mobiles than for the background. Our integrated system improves the coding representation of the video content without data overhead. It can also be used in applications requiring selective scrambling of regions of interest as well as for any other application dealing with regions of interest.
Cinematography with Unmanned Aerial Vehicles (UAVs) is an emerging technology promising to revolutionize media production. On the one hand, manually controlled drones already provide advantages, such as flexible shot setup, opportunities for novel shot types and access to difficult-toreach spaces and/or viewpoints. Moreover, little additional ground infrastructure is required. On the other hand, en-This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 731667 (MULTIDRONE). This publication reflects the authors' views only. The European Commission is not responsible for any use that may be made of the information it contains.
Simultaneous Localization and Mapping (SLAM) research has reached a level of maturity enabling systems to build autonomously an accurate sparse map of the environment while localizing themselves in that map. At the same time, the use of deep learning has recently brought great improvements in Monocular Depth Prediction (MDP). Some applications such as autonomous drone navigation and obstacle avoidance require dense structure information and cannot only rely on sparse SLAM representation. We propose to densify a state-of-theart SLAM algorithm using deep learning-based dense MDP at keyframe rate. Towards this goal, we describe a scale recovery from SLAM landmarks by minimizing a depth error metric combined with a multi-view depth refinement using a volumetric approach. We conclude with experiments that attest the added value of our approach in terms of depth estimation.
This paper presents an exemplar based metric learning framework dedicated to robust visual localization in complex scenes, e.g. street images. The proposed framework learns off-line a specific (local) metric for each image of the database, so that the distance between a database image and a query image representing the same scene is smaller than the distance between the current image and other images of the database. To achieve this goal, we generate geometric and photometric transformations as proxies for query images. From the generated constraints, the learning problem is cast as a convex optimization problem over the cone of positive semi-definite matrices, which is efficiently solved using a projected gradient descent scheme. Successful experiments, conducted using a freely available geo-referenced image database, reveal that the proposed method significantly improves results over the metric in the input space, while being as efficient at test time. In addition, we show that the model learns discriminating features for the localization task, and is able to gain invariance to meaningful transformations.Index Terms-content-based image retrieval, supervised metric learning, visual localization, place recognition CONTEXTThe problem tackled in this paper is visual localization at a street level [7], mobile cell phone information [6],...). Our typical scenario is the precise localization of a vehicle whose approximate localization is known. Our problem is cast to an image retrieval (IR) problem using only 2D features extracted from acquired images and 2D geo-referenced database image (Fig. 1). Exploiting only image content is challenging because, even if query and database images depict the same scene, camera-view points, illumination and colorimetry are different, the scene itself may have changed over time and Fig. 1. Our system aims at answering the following question: knowing a rough position of the vehicle in a street and the scene being observed by the vehicle's camera, can we determine where is it exactly along the street? been occluded. Preliminary experiments made clear that standard image retrieval approaches may not always be selective enough for street areas because the same features tend to be shared by several neighbour images. Standard matching methods can be classified into voting-based strategies [8] and methods relying on the Bag of Words (BOW) model [9]. Voting-based methods [8] search for each query image descriptor the N nearest descriptors belonging to database descriptors. Each of these N nearest descriptors votes for a database image. The images having the highest vote number are likely to be similar images. Ultimately, geometrical verification is often used to further improve performances. These methods are effective, but are very time consuming and do not scale well to large databases. BOW-based methods [9] quantify local descriptors of images with a codebook of visual words to generate a visual words histogram. The codebook is previously learned by clustering feature space of a...
Autonomous or semi-autonomous navigation of UAVs is of great interest in the Defense and Security domains, as it significantly improves their efficiency and responsiveness during operations. The perception of the environment and in particular the dense and metric 3D mapping in real time is a priority for navigation and obstacle avoidance. We therefore present our strategy to jointly estimate a dense 3D map by combining a sparse map estimated by a state-of-the-art Simultaneous Localization and Mapping (SLAM) system and a dense depth map predicted by a monocular self-supervised method. Then, a lightweight and volumetric multi-view fusion solution is used to build and update a voxel map.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.