Abstract-Detecting the camera model used to shoot a picture enables to solve a wide series of forensic problems, from copyright infringement to ownership attribution. For this reason, the forensic community has developed a set of camera model identification algorithms that exploit characteristic traces left on acquired images by the processing pipelines specific of each camera model. In this paper, we investigate a novel approach to solve camera model identification problem. Specifically, we propose a data-driven algorithm based on convolutional neural networks, which learns features characterizing each camera model directly from the acquired pictures. Results on a wellknown dataset of 18 camera models show that: (i) the proposed method outperforms up-to-date state-of-the-art algorithms on classification of 64x64 color image patches; (ii) features learned by the proposed network generalize to camera models never used for training.
Pedestrian detection is a popular research topic due to its paramount importance for a number of applications, especially in the fields of automotive, surveillance and robotics. Despite the significant improvements, pedestrian detection is still an open challenge that calls for more and more accurate algorithms. In the last few years, deep learning and in particular convolutional neural networks emerged as the state of the art in terms of accuracy for a number of computer vision tasks such as image classification, object detection and segmentation, often outperforming the previous gold standards by a large margin. In this paper, we propose a pedestrian detection system based on deep learning, adapting a general-purpose convolutional network to the task at hand. By thoroughly analyzing and optimizing each step of the detection pipeline we propose an architecture that outperforms traditional methods, achieving a task accuracy close to that of state-of-the-art approaches, while requiring a low computational time. Finally, we tested the system on an NVIDIA Jetson TK1, a 192-core platform that is envisioned to be a forerunner computational brain of future self-driving cars.
Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics.
Abstract-We compare two paradigms for image analysis in visual sensor networks (VSN). In the compress-then-analyze (CTA) paradigm, images acquired from camera nodes are compressed and sent to a central controller for further analysis. Conversely, in the analyze-then-compress (ATC) approach, camera nodes perform visual feature extraction and transmit a compressed version of these features to a central controller. We focus on state-of-the-art binary features which are particularly suitable for resource-constrained VSNs, and we show that the "winning" paradigm depends primarily on the network conditions. Indeed, while the ATC approach might be the only possible way to perform analysis at low available bitrates, the CTA approach reaches the best results when the available bandwidth enables the transmission of high-quality images.
Binary descriptors have recently emerged as low-complexity alternatives to state-of-the-art descriptors such as SIFT. The descriptor is represented by means of a binary string, in which each bit is the result of the pair-wise comparison of smoothed pixel values properly selected in a patch around each keypoint. Previous works have focused on the construction of the descriptor neglecting the opportunity of performing lossless compression. In this paper, we propose two contributions. First, design an entropy coding scheme that seeks the internal ordering of the descriptor that minimizes the number of bits necessary to represent it. Second, we compare different selection strategies that can be adopted to identify which pair-wise comparisons to use when building the descriptor. Unlike previous works, we evaluate the discriminative power of descriptors as a function of rate, in order to investigate the trade-offs in a bandwidth constrained scenario.
Camera model identification is paramount to verify image origin and authenticity in a blind fashion. State-of-the-art techniques leverage the analysis of features describing characteristic footprints left on images by different camera models from the image acquisition pipeline (e.g., traces left by proprietary demosaicing strategies, etc.). Motivated by the very accurate performance achieved by feature-based methods, as well as by the progress brought by deep architectures in machine learning, we explore in this paper the possibility of taking advantage of convolutional neural networks (CNNs) for camera model identification. More specifically, we investigate: (i) the capability of different network architectures to learn discriminant features directly from the observed images; (ii) the dependency between the amount of training data and the achieved accuracy; (iii) the importance of selecting a correct protocol for training, validation and testing. This study shows that promising results can be obtained on small image patches training a CNN with an affordable setup (i.e., a personal computer with one dedicated GPU) in a reasonable amount of time (i.e., approximately one hour), given that a sufficient amount of training images is available. Conv-1 (32) Conv-1 (32) Conv-1 (32) Conv-1 (32) Conv-1 (32) Conv-1 (32) Conv-1 (32) Pool-1 Conv-2 (48) Conv-2 (48) Conv-2 (48) Conv-2 (48) Conv-2 (48) Conv-2 (48) Conv-2 (48) Pool-2 Conv-3 (64) Conv-3 (64) Conv-3 (64) Conv-3 (64) Conv-3 (64) Conv-3 (64) Conv-3 (64) Pool-3 Conv-4 (128) Conv-4 (128) Conv-4 (128) Conv-4 (128) Conv-4 (128) Conv-4 (128) Conv-4 (128) InnerProduct-1 ReLU-1 InnerProduct-2 SoftMax
Abstract-Technology is quickly revolutionizing our everyday lives, helping us to perform complex tasks. The Internet of Things (IoT) paradigm is getting more and more popular and is key to the development of Smart Cities. Among all the applications of IoT in the context of Smart Cities, realtime parking lot occupancy detection recently gained a lot of attention. Solutions based on computer vision yield good performance in terms of accuracy and are deployable on top of visual sensor networks. Since the problem of detecting vacant parking lots is usually distributed over multiple cameras, adhoc algorithms for content acquisition and transmission are to be devised. A traditional paradigm consists in acquiring and encoding images or videos and transmitting them to a central controller, which is responsible for analyzing such content. A novel paradigm, which moves part of the analysis to sensing devices, is quickly becoming popular. We propose a system for distributed parking lot occupancy detection based on the latter paradigm, showing that onboard analysis and transmission of simple features yield better performance with respect to the traditional paradigm in terms of the overall rate-energyaccuracy performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.