Abstract:This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels in each track: one is a mixture of the lead vocals and background accompaniment, and the other consists of accompaniment only. Although the two audio channels are distinct, the accompaniments in the two channels often resemble each other. We exploit this characte… Show more
“…The query may be transposed by, e.g., all possible numbers of semitones within the octave (Yu et al, 2008) or from −5 to +5 semitones in half-of-the-semitone steps (Jang et al, 2011). Various numbers of repetitions may be considered but in any way this is clearly a brute-force approach which increases the computational complexity significantly.…”
Section: Melody Matchingmentioning
confidence: 99%
“…In general, the proposed tune follower and its adaptive variant enable to efficiently refine the results without computationally complex methods such as repeating the DTW for all possible transpositions (Yu et al, 2008). It should be noted that they can be used independently from efficient indexing techniques (Zhu, Shasha, 2003; Keogh, 2002) or note-based approximate algorithms to increase the speed and reliability of a QBSH-based search engine.…”
Dynamic Time Warping is a standard algorithm used for matching time series irrespective of local tempo variations. Its application in the context of Query-by-Humming interface to multimedia databases requires providing the transposition independence, which involves some additional, sometimes computationally expensive processing and may not guarantee the success, e.g., in the presence of a pitch trend or accidental key changes.The method of tune following, proposed in this paper, enables solving the pitch alignment problem in an adaptive way inspired by the human ability of ignoring typical errors occurring in sung melodies. The experimental validation performed on the database containing 4431 queries and over 5000 templates confirmed the enhancement introduced by the proposed algorithm in terms of the global recognition rate.
“…The query may be transposed by, e.g., all possible numbers of semitones within the octave (Yu et al, 2008) or from −5 to +5 semitones in half-of-the-semitone steps (Jang et al, 2011). Various numbers of repetitions may be considered but in any way this is clearly a brute-force approach which increases the computational complexity significantly.…”
Section: Melody Matchingmentioning
confidence: 99%
“…In general, the proposed tune follower and its adaptive variant enable to efficiently refine the results without computationally complex methods such as repeating the DTW for all possible transpositions (Yu et al, 2008). It should be noted that they can be used independently from efficient indexing techniques (Zhu, Shasha, 2003; Keogh, 2002) or note-based approximate algorithms to increase the speed and reliability of a QBSH-based search engine.…”
Dynamic Time Warping is a standard algorithm used for matching time series irrespective of local tempo variations. Its application in the context of Query-by-Humming interface to multimedia databases requires providing the transposition independence, which involves some additional, sometimes computationally expensive processing and may not guarantee the success, e.g., in the presence of a pitch trend or accidental key changes.The method of tune following, proposed in this paper, enables solving the pitch alignment problem in an adaptive way inspired by the human ability of ignoring typical errors occurring in sung melodies. The experimental validation performed on the database containing 4431 queries and over 5000 templates confirmed the enhancement introduced by the proposed algorithm in terms of the global recognition rate.
ABSTRACTsearch in huge musical datasets using a query provided as a fragment of desired song while there exists no extra information is a particular concern in content-based music information retrieval (MIR), defined as query-by-example (QBE). A number of QBE based MIR systems have evolved in recent years, which search a desired song without any manual of its originality, such as title, composer, singer or etc., and return a list of songs ranked in descending order according to the similarity with the given query recorded by user on TV, in gym or so on. Although, too much attention has been paid to this topic by researchers and developers in several communities, such as information retrieval, data mining or multimedia browsing engines, but it still suffers from no existing a unique definition on structure, aim, similarity, performance and also output results. This paper focuses on providing a brief overview of available QBE based MIR systems to manifest variety, opportunities and challenges in this area.
“…The comprehensive online music discography Discogs.com lists over 200,000 releases containing an instrumental mix but only about 40,000 which include an a cappella mix. The availability of these separated mixes are crucial in the creation and performance of some genres of music [3,4,5]. These instrumental and a cappella versions can also be used as ground-truth for vocal removal or isolation algorithms [6].…”
Section: Introductionmentioning
confidence: 99%
“…A simple approach is proposed in [5] where an optimally shifted and scaled instrumental mix is subtracted from the complete mix in the time or frequency domain in attempt to obtain a (previously unavailable) a cappella mix. However, this approach does not cover the more general case where different mixes may be extracted from different media (e.g.…”
We consider the situation where there are multiple audio signals whose relationship is of interest. If these signals have been differently captured, the otherwise similar signals may be distorted by fixed filtering and/or unsynchronized timebases. Examples include recordings of signals before and after radio transmission and different versions of musical mixes obtained from CDs and vinyl LPs. We present techniques for estimating and correcting timing and channel differences across related signals. Our approach is evaluated in the context of artificially manipulated speech utterances and two source separation tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.