The increasing development of peer-to-peer networks for delivering and sharing multimedia files poses the problem of how to protect these contents from unauthorized and, possibly, malicious manipulations. In the past few years, a large amount of techniques, including multimedia hashes and digital watermarking, have been proposed to identify whether a multimedia content has been illegally tampered or not. Nevertheless, very few efforts have been devoted to identifying which kind of attack has been carried out, with the aim of assessing whether the modified content is still meaningful for the final user and, hopefully, of recovering the original content semantics. One of the main issues that have prevented multimedia hashes from being used for tampering identification is the large amount of data required for this task. Generally the size of the hash should be kept as small as possible to reduce the bandwidth overhead. To overcome this limitation, we propose a novel hashing scheme which exploits the paradigms of compressive sensing and distributed source coding to generate a compact hash signature, and apply it to the case of audio content protection. The audio content provider produces a small hash signature by computing a limited number of random projections of a perceptual, time-frequency representation of the original audio stream; the audio hash is given by the syndrome bits of an LDPC code applied to the projections. At the content user side, the hash is decoded using distributed source coding tools, provided that the distortion introduced by tampering is not too high. If the tampering is sparsifiable or compressible in some orthonormal basis or redundant dictionary (e.g. DCT or wavelet), it is possible to identify the time-frequency position of the attack, with a hash size as small as 200 bits/second: the bit saving obtained by introducing distributed source coding ranges between 20% to 70%.
In this paper we present a novel video coding scheme to compress stereo video sequences. We consider a wireless sensor network scenario, where the sensing nodes cannot communicate with each other and are characterized by limited computational complexity. The joint decoder exploits both the temporal and inter-view correlation to generate the side information. To this end, we propose a fusion algorithm that adaptively selects either the temporal or the inter-view side information on a pixel-by-pixel basis. In addition, the coding algorithm is symmetric with respect to the two cameras. We also propose a practical stopping criterion for turbo decoding that determines when decoding is successful. Experimental results on stereo video sequences show that a coding efficiency gain up to 4dB can be obtained by the proposed scheme at high bit-rates.
Tangible Acoustic Interfaces (TAIs) are innovative acoustic Human-Machine Interaction devices. Exploiting a number of contact sensors distributed on a surface, the vibrational signal generated from the interaction between the surface and an object moved by the user is acquired and analyzed to recognize what the user is doing on the device. The usage of vibrational sensors naturally opens the way also to classification and recognition applications. In this paper, a system to perform audio-based interaction object recognition is presented. The aim of the system is to recognize what object the human is using to interact with the TAI, by exploiting feature analysis and classification techniques. In particular, a frame-by-frame SVM-based classifier architecture is used to perform object recognition. The result is then filtered to eliminate the possible classification outliers. By training and testing our system using signals from four interaction objects at different Signal to Noise Ratios we have reached accuracies between 73% and 100% according to the object used, the quality of the acquired signal and the optional use of the classification filtering algorithm
In this paper we present a novel hierarchical and scalable three-stage algorithm to effectively perform musical audio semantic segmentation. In the first stage, the energy spectrum of the entire audio track is analyzed to find significant energy textures that may characterize different semantic segments; in the second and third stages, tonal and timbric features are used to refine the segmentation by moving or deleting segment boundaries. Experimental results on a set of 58 songs show that our algorithm is able to attain good semantic segmentation just after the first step, with a precision of 64% and a recall of 96%. After second step the precision increases to 79%; the best precision result is obtained after the third step, where a value of 85% is reached. In this step the minimum average recall value of 92% is obtained
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.