International audienceWe address the issue of underdetermined source separation in a particular informed configuration where both the sources and the mixtures are known during a so-called encoding stage. This knowledge enables the computation of a side-information which is small enough to be inaudibly embedded into the mixtures. At the decoding stage, the sources are no longer assumed to be known, only the mixtures and the extracted side-information are processed for source separation. The proposed system models the sources as independent and locally stationary Gaussian processes (GP) and the mixing process as a linear filtering. This model allows reliable estimation of the sources through generalized Wiener filtering, provided their spectrograms are known. As these spectrograms are too large to be embedded in the mixtures, we show how they can be efficiently approximated using either Nonnegative Tensor Factorization (NTF) or image compression. A high-capacity embedding method is used by the system to inaudibly embed the separation side-information into the mixtures. This method is an application of the Quantization Index Modulation technique applied to the time-frequency coefficients of the mixtures and permits to reach embedding rates of about 250 kbps. Finally, a study of the performance of the full system is presented
In this paper we propose a high-rate data hiding technique for audio signals suitable for non-secure applications that require a large bit rate but no particular robustness to attacks. More particularly, the proposed technique is suitable for enriched-content applications involving uncompressed PCM audio signals, as used in audio-CD and .wav formats. It applies the Quantization Index Modulation (QIM) technique on the Modified Discrete Cosine Transform (MDCT) or Integer MDCT (IntMDCT) coefficients of the signal. The basic principle is that if these coefficients can be significantly modified by quantization in perceptual audio compression with very moderate quality impairments, they can also be modified to embed data. Following audio compression principles, a Psychoacoustic Model (PAM) is used at the embedding stage to consider the properties of the human auditory system and match the inaudibility constraint. The PAM is used to estimate the number of bits to be embedded in each MDCT coefficient for each frame. The resulting set of values is transmitted to the decoder as a minor part of the total embedded side-information. For this aim, a specific fixed embedding space is allocated in the high frequencies of the spectrum. With this technique, simulations on real audio signals show that bit rates of about 250 kbps per audio channel can be reached (depending on the audio content).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.