The Cover-Source Mismatch (CSM) has been long recognized as a major problem in modern steganography and steganalysis. Indeed, while a vast majority of works in steganography and steganalysis had been tailored to a specific reference database, namely BOSSbase, recent works show that, because of CSM, the results may greatly differ when changing this dataset. Although the CSM has already been the subject of several publications, these prior works investigated only a few elements in a limited setup. The goal of the current paper is to study the effects of the CSM in a more comprehensive manner and then to examine and compare different strategies for mitigating it. It first defines two different parameters, the source difficulty and the source inconsistency, which are involved in the CSM. Then, using different steganographic schemes and feature sets, it aims at providing a systematic study regarding the various factors that can give birth to CSM for image steganalysis. Finally, two practical ways to mitigate the CSM, using training techniques promoting either diversity of different sources or the specificity of one targeted source which is beforehand identified by training a multi-class classifier, are presented and their performances are compared for different training set sizes.
This short paper presents a novel method for steganography in JPEG-compressed images, extended the so-called MiPOD scheme based on minimizing the detection accuracy of the most-powerful test using a Gaussian model of independent DCT coefficients. This method is also applied to address the problem of embedding into color JPEG images. The main issue in such case is that color channels are not processed in the same way and, hence, a statistically based approach is expected to bring significant improvements when one needs to consider heterogeneous channels together. The results presented show that, on the one hand, the extension of MiPOD for JPEG domain, referred to as J-MiPOD, is very competitive as compared to current state-of-the-art embedding schemes. On the other hands, we also show that addressing the problem of embedding in JPEG color images is far from being straightforward and that future works are required to understand better how to deal with color channels in JPEG images.
It is now well known that practical steganalysis using machine learning techniques can be strongly biased by the problem of Cover Source Mismatch. Such a phenomenon usually occurs in machine learning when the training and the testing sets are drawn from different sources, i.e. when they do not share the same statistical properties. In the field of steganalysis however, due to the small power of the signal targeted by steganalysis methods, it can drastically lower their performance. This paper aims to define through practical experiments what is a source in steganalysis. By assuming that two cover datasets coming from a common source should provide comparable performances in steganalysis, it is shown that the definition of a source is more related with the processing pipeline of the RAW images than with the sensor or the acquisition setup of the pictures. In order to measure the discrepancy between sources, this paper introduces the concept of consistency between sources, that quantifies how much two sources are subject to Cover Source Mismatch. We show that by adopting "training design", we can increase the consistency between the training set and the testing set. To measure how much image processing operation may help the steganographers this paper also introduces the intrinsic difficulty of a source. It is observed that some processes such as JPEG quantization tables or the development pipeline can dramatically increase or decrease the performance of steganalysis methods and that other parameters such as the ISO sensitivity or the sensor model have minor impact on the performance.
This paper briefly summarizes the ALASKA#2 steganalysis challenge which has been organized on the Kaggle machine learning competition platform. We especially focus on the context, the organization (rules, timeline, evaluation and material) as well as on the outcome (number of competitors, submission, findings, and final results). While both steganography and steganalysis were new to most of the competitors, they were able to leverage their skills in Deep Learning in order to design detection methods that perform significantly better than current art in steganalysis. Despite the fact that these solutions come at an important computational cost, they clearly indicate new directions to explore in steganalysis research.
The current art in schemes using deflection criterion such as MiPOD for JPEG steganography is either under-performing or on par with distortion-based schemes. We link this lack of performance to a poor estimation of the variance of the model of the noise on the cover image. In this paper, we propose a method to better estimate the variances of DCT coefficients by taking into account the dependencies between pixels that come from the development pipeline. Using this estimate, we are able to extend statistically-informed steganographic schemesto the JPEG domain while significantly outperforming the current state-of-the-art JPEG steganography. An extension of Gaussian Embedding in the JPEG domain using quantization error as side-information is also formulated and shown to attain state-of-the-art performances.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.