Many pattern recognition tasks can be modeled as proximity searching. Here the common task is to quickly find all the elements close to a given query without sequentially scanning a very large database. A recent shift in the searching paradigm has been established by using permutations instead of distances to predict proximity. Every object in the database record how the set of reference objects (the permutants) is seen, i.e. only the relative positions are used. When a query arrives the relative displacements in the permutants between the query and a particular object is measured. This approach turned out to be the most efficient and scalable, at the expense of loosing recall in the answers. The permutation of every object is represented with κ short integers in practice, producing bulky indexes of 16κn bits. In this paper we show how to represent the permutation as a binary vector, using just one bit for each permutant (instead of log κ in the plain representation). The Hamming distance in the binary signature is used then to predict proximity between objects in the database. We tested this approach with many real life metric databases obtaining faster queries with a recall close to the Spearman ρ using 16 times less space.
In this paper we propose a new technique to characterize audio-signals. We use Shannon's Entropy to estimate the level of information content per chroma and we show that involving entropy contributes for a more robust audio characterization. A new audio fingerprint (AFP) based on this feature is proposed in this paper which we have called Entropy-Chroma Fingerprint (ECFP). Two approaches were considered to estimate entropy; the first assumes the spectral coe f ficients distribute normally, while the second, estimates its probability density function (PDF) with the Parzen Windows Estimation method. We compared the robustness of the ECFP against the Chromagram-Based Audio-Fingerprint (CBFP) which is determined using the Constant Q Transform (CQT). Three thousand and five hundred AFPs were determined from songs of several genres. A subset of 350 songs were severely degraded and searched for using excerpts of 5 seconds for that matter. The ECFP determined assuming gaussianity on the PDF turned out to be much more robust than the CBFP. The ECFP determined assuming gaussianity is much faster to process than both, the CBFP and the ECFP determined with Parzen Windows and still more robust.
Abstract. Monitoring media broadcast content has deserved a lot of attention lately from both academy and industry due to the technical challenge involved and its economic importance (e.g. in advertising). The problem pose a unique challenge from the pattern recognition point of view because a very high recognition rate is needed under non ideal conditions. The problem consist in comparing a small audio sequence (the commercial ad) with a large audio stream (the broadcast) searching for matches.In this paper we present a solution with the Multi-Band Spectral Entropy Signature (MBSES) which is very robust to degradations commonly found on amplitude modulated (AM) radio. Using the MBSES we obtained perfect recall (all audio ads occurrences were accurately found with no false positives) in 95 hours of audio from five different am radio broadcasts. Our system is able to scan one hour of audio in 40 seconds if the audio is already fingerprinted (e.g. with a separated slave computer), and it totaled five minutes per hour including the fingerprint extraction using a single core off the shelf desktop computer with no parallelization.
In Automatic Speech Recognition, Voice Synthesis, Speaker Identification and identifying laringeal diseases, it is critical to classify speech segments as voiced or unvoiced. Several techniques have been proposed for this issue during the last twenty years, unfortunately, they either have especial cases where the result is unreliable or need to use not only the present segment of speech but the next one as well, this fact limits its applications (i.e Continuos Speech recognition). In this paper we present an alternative to voiced/unvoiced classification using a Discretization of the Continuos Fourier Transform.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.