Forensic research community keeps proposing new techniques to analyze digital images and videos. However, the performance of proposed tools are usually tested on data that are far from reality in terms of resolution, source device, and processing history. Remarkably, in the latest years, portable devices became the preferred means to capture images and videos, and contents are commonly shared through social media platforms (SMPs, for example, Facebook, YouTube, etc.). These facts pose new challenges to the forensic community: for example, most modern cameras feature digital stabilization, that is proved to severely hinder the performance of video source identification technologies; moreover, the strong re-compression enforced by SMPs during upload threatens the reliability of multimedia forensic tools. On the other hand, portable devices capture both images and videos with the same sensor, opening new forensic opportunities. The goal of this paper is to propose the VISION dataset as a contribution to the development of multimedia forensics. The VISION dataset is currently composed by 34,427 images and 1914 videos, both in the native format and in their social version (Facebook, YouTube, and WhatsApp are considered), from 35 portable devices of 11 major brands. VISION can be exploited as benchmark for the exhaustive evaluation of several image and video forensic tools.
Millions of users share images and videos generated by mobile devices with different profiles on social media platforms. When publishing illegal content, they prefer to use anonymous profiles. Multimedia Forensics allows us to determine whether videos or images have been captured with the same device, and thus, possibly, by the same person. Currently, the most promising technology to achieve this task exploits unique traces left by the camera sensor into the visual content. However, image and video source identification are still treated separately from one another. This approach is limited and anachronistic, if we consider that most of the visual media are today acquired using smartphones that capture both images and videos. In this paper we overcome this limitation by exploring a new approach that synergistically exploits images and videos to study the device from which they both come. Indeed, we prove it is possible to identify the source of a digital video by exploiting a reference sensor pattern noise generated from still images taken by the same device. The proposed method provides performance comparable with or even better than the state-of-the-art, where a reference pattern is estimated from video frames. Finally, we show that this strategy is effective even in the case of in-camera digitally stabilized videos, where a non-stabilized reference is not available, thus solving the limitations of the current state-of-the-art. We also show how this approach allows us to link social media profiles containing images and videos captured by the same sensor.
Video forensics is an emerging discipline, that aims at inferring information about the processing history undergone by a digital video in a blind fashion. In this work we introduce a new forensic footprint and, based on it, propose a method for detecting whether a video has been encoded twice; if this is the case, we also estimate the size of the Group Of Pictures (GOP) employed during the first encoding. As shown in the experiments, the footprint proves to be very robust even in realistic settings (i.e., when encoding is carried out using typical compression rates), that are rarely addressed by existing techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.