Efficient and Robust Music Identification With Weighted Finite-State Transducers

Mohri, Mehryar; Moreno, Pedro J.; Weinstein, Eugene

doi:10.1109/tasl.2009.2023170

Cited by 14 publications

(10 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Accuracy for multi-conditional training Front-end. We test the modifications we propose in Section 2.1 to our baseline speech recognition front-end (similar to that reported in [1]) by training an acoustic model on CLEAN, building the FST from the songs in INDEX, and testing the modifications on the four sets of snippets. As shown in Table 2 adapting the front-end to the task of music recognition consistently improved identification accuracy across the different test sets.…”

Section: Resultsmentioning

confidence: 99%

“…The above system achieved 99.9% identification accuracy over test snippets cut from clean recordings, and in [6,1] we showed that the system was robust to synthetic distortions. Nevertheless, we observed a significant degradation in accuracy when the test recordings were recorded with mobile phones.…”

Section: Modelingmentioning

confidence: 91%

“…The use of weighted finite-state transducers (WFSTs) for music recognition was introduced in [2,1]. In this approach, the acoustic model is initialized by unsupervised clustering of the features to form a set of music phonemes.…”

Section: Modelingmentioning

confidence: 99%

“…The algorithm for constructing this WFST uses weighted determinization and minimization, which preserves the total weight of a path corresponding to a given song. We refer the reader to [1] for details.…”

Section: Modelingmentioning

confidence: 99%

“…

ABSTRACTWe present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [1]. We accomplish this by adapting the features specifically to this task, and by introducing new modeling techniques that enable using a corpus of noisy and channel-distorted data to improve mobile music recognition quality.

…”

mentioning

confidence: 99%

See 4 more Smart Citations

Mobile music modeling, analysis and recognition

Golik

Harb

Misra

et al. 2012

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [1]. We accomplish this by adapting the features specifically to this task, and by introducing new modeling techniques that enable using a corpus of noisy and channel-distorted data to improve mobile music recognition quality. We report the results of an extensive empirical investigation of the system's robustness under realistic channel effects and distortions. We show an improvement of recognition accuracy by explicit duration modeling of music phonemes and by integrating the expected noise environment into the training process. Finally, we propose the use of frame-to-phoneme alignment for high-level structure analysis of polyphonic music.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Modelingmentioning

confidence: 91%

Section: Modelingmentioning

confidence: 99%

Section: Modelingmentioning

confidence: 99%

“…

…”

mentioning

confidence: 99%

See 3 more Smart Citations

Mobile music modeling, analysis and recognition

Golik

Harb

Misra

et al. 2012

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

On the Learnability of Shuffle Ideals

Angluin

Aspnes

Kontorovich

2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

PAC learning of unrestricted regular languages is long known to be a difficult problem. The class of shuffle ideals is a very restricted subclass of regular languages, where the shuffle ideal generated by a string u is the collection of all strings containing u as a subsequence. This fundamental language family is of theoretical interest in its own right and provides the building blocks for other important language families. Despite its apparent simplicity, the class of shuffle ideals appears quite difficult to learn. In particular, just as for unrestricted regular languages, the class is not properly PAC learnable in polynomial time if RP = NP, and PAC learning the class improperly in polynomial time would imply polynomial time algorithms for certain fundamental problems in cryptography. In the positive direction, we give an efficient algorithm for properly learning shuffle ideals in the statistical query (and therefore also PAC) model under the uniform distribution.

show abstract