Hierarchical filtering method for content-based music retrieval via acoustic input

Jang, Jyh-Shing Roger; Lee, Hong-Ru

doi:10.1145/500141.500201

Cited by 63 publications

(13 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…, n. These two vectors are not necessarily of the same size, and we can apply DTW to match each point of the test vector to that of the reference vector in an optimal way. That is, we want to construct an m ϫ n DTW table D(i, j) according to dynamic programming and then identify the optimal mapping (or path) from each point of the test vector t to that of the reference vector r. The exact formula of DTW for our MIRAI engine can be found in Jang and Kao (2000) and Jang and Lee (2001a) and will not be repeated here. Note that when we construct a DTW table, we can force the first point of t to match the first point of r. This case of "match beginning" represents the situation that the user sings/hums from the beginning of a song.…”

Section: Query By Singing/hummingmentioning

confidence: 99%

“…We have tried the difference operator on the pitch vector and found that the operator tends to amplify noises and deteriorate the system's performance. Thus we employ a heuristic to shift the input pitch vector 5 times to achieve a minimum DTW distance when comparing with a candidate song (Jang & Lee, 2001a). The system then returns a ranked song list according to the computed similarity scores.…”

Section: Query By Singing/hummingmentioning

confidence: 99%

“…We can also apply a hierarchical filtering method that first filter out 90% unlikely candidates in a quick and dirty manner and then leave the remaining 10% for detailed comparison. A detailed analysis of the hierarchical filtering method can be found in Jang and Lee (2001a).…”

Section: Query By Singing/hummingmentioning

confidence: 99%

See 2 more Smart Citations

Research and developments of a multi‐modal MIR engine for commercial applications in East Asia1

Jang

Lee

Chen

et al. 2004

J. Am. Soc. Inf. Sci.

Self Cite

View full text Add to dashboard Cite

This article describes the research and development of an efficient Music Information Retrieval (MIR) engine that is embedded in a karaoke software package targeted for Asian people's need of music retrieval. The MIR engine has a multi-modal interface that allows queries by singing, humming, tapping, speaking, and writing. In particular, we discuss the design philosophy, technical barriers, and performance evaluation of such an engine, as well as its current and potential commercial applications. Feedbacks and feature requests from users, which greatly influence our future work, are also addressed.

show abstract

Section: Query By Singing/hummingmentioning

confidence: 99%

Section: Query By Singing/hummingmentioning

confidence: 99%

See 1 more Smart Citation

Research and developments of a multi‐modal MIR engine for commercial applications in East Asia1

Jang

Lee

Chen

et al. 2004

J. Am. Soc. Inf. Sci.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this task, note-based and frame-based similarity measures are two commonly used methods. Jang proposed a frame-based template matching strategy by calculating time series similarity with high precision [2], but this method is very time-consuming when the template database growing larger. Typke used transportation distance (EMD), which is a variation of note-based measurement, to achieve satisfying retrieval speed comparing to the frame-based method but loss of precision in some degrees [3].…”

Section: Introductionmentioning

confidence: 99%

An effective and efficient method for query by humming system based on multi-similarity measurement fusion

Wang

Huang

et al. 2008

2008 International Conference on Audio, Language and Image Processing

View full text Add to dashboard Cite

Since it is the most natural way for people to search a specific melody in large music database, query by humming/singing is attracting more and more researchers' attention in the field of content-based music information retrieval. In this task, note-based and frame-based similarity measures are two commonly used methods. However, in previous works, researchers always focus on one of the two methods alone. In this paper, we propose a novel scheme taking advantage of two different similarity measurements to improve not only the retrieval accuracy but also the retrieving speed. First, Earth Mover's Distance (EMD), which is note-based and much faster, is adopted to eliminate most unlikely candidate. Then, Dynamic Time Warping (DTW), which is frame-based and more accurate, is executed on these surviving candidates. Finally, fusion strategies of these two similarity measurements are employed to improve the performance of whole system. Experiments show our approach can achieve 92.9% accuracy on the database used in MIREX 2006 QBH contest, which is better than those systems participated in that task

show abstract

“…Even professional singers do not necessarily present error-free queries to MIR systems [31]- [33], [37], because they may not always recall the theme perfectly. To handle such errors, various approximate matching methods, such as dynamic time warping (DTW) [5], [6], [13], [23], [24], the hidden Markov model [15], [31], and the N-gram model [11], [14], have been developed, with DTW being the most popular. However, due to the considerable computational time required for DTW, several speed-up methods have been proposed [5], [23], [24], [31], so that a large-scale music database can be searched more efficiently.…”

mentioning

confidence: 99%

A Query-by-Singing System for Retrieving Karaoke Music

Tsai

Wang

2008

IEEE Trans. Multimedia

View full text Add to dashboard Cite

This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels in each track: one is a mixture of the lead vocals and background accompaniment, and the other consists of accompaniment only. Although the two audio channels are distinct, the accompaniments in the two channels often resemble each other. We exploit this characteristic to: i) infer the background accompaniment for the lead vocals from the accompaniment-only channel, so that the main melody underlying the lead vocals can be extracted more effectively; and ii) detect phrase onsets based on the Bayesian information criterion (BIC) to predict the onset points of a song where a user's sung query may begin, so that the similarity between the melodies of the query and the song can be examined more efficiently. To further refine extraction of the main melody, we propose correcting potential errors in the estimated sung notes by exploiting a composition characteristic of popular songs whereby the sung notes within a verse or chorus section usually vary no more than two octaves. In addition, to facilitate an efficient and accurate search of a large music database, we employ multiple-pass dynamic time warping (DTW) combined with multiple-level data abstraction (MLDA) to compare the similarities of melodies. The results of experiments conducted on a karaoke database comprised of 1071 popular songs demonstrate the feasibility of query-by-singing retrieval for karaoke music.Index Terms-Bayesian information criterion, dynamic time warping, karaoke, music information retrieval, query-by-singing.

show abstract

Hierarchical filtering method for content-based music retrieval via acoustic input

Cited by 63 publications

References 12 publications

Research and developments of a multi‐modal MIR engine for commercial applications in East Asia1

Research and developments of a multi‐modal MIR engine for commercial applications in East Asia1

An effective and efficient method for query by humming system based on multi-similarity measurement fusion

A Query-by-Singing System for Retrieving Karaoke Music

Contact Info

Product

Resources

About