Detecting signing voice in a piece of music work (soundtrack) has been studied for many years because this technique is the foundation for many advanced applications [1]. In the following, we briefly describe some of the applications. Firstly, if we intend to remove the vocal sound from a singing soundtrack for karaoke singers, the pre-processing step certainly needs to pin point the audio segments with singing voice [2]. Second, we know that the most well-known portion of a western popular song is usually on the verse part, which almost always contains singing performance. Therefore, the work of music summarization [3] as well as melody extraction [4] can also benefit from knowing the segments with signing voice. Next, if we want to identify the singer in a music work, we need to have the singing segments before conducting recognition [5]. In addition to the above applications, if we intend to perform a lyrics-to-melody conversion [6], we also need to know the signing segments. From the above examples, we know that singing voice detection is a fundamental pre-processing step for many applications. There are two types of problems in detecting singing voices in a piece of audio work. The first type is to mark the starting and ending points of all vocal segments on the soundtrack, referred to as the singing voice segmentation problem. The second type is to determine whether a short audio clip (e.g., 2 s) contains any human-perceivable vocal sound, including the vocal sound of the background vocalists. This type of problem is
This paper describes a multiresolution system based on MPEG-7 audio signature descriptors for music identification. Such an identification system may be used to detect illegally copied music circulated over the Internet. In the proposed system, low-resolution descriptors are used to search likely candidates, and then full-resolution descriptors are used to identify the unknown (query) audio. With this arrangement, the proposed system achieves both high speed and high accuracy. To deal with the problem that a piece of query audio may not be inside the system's database, we suggest two different methods to find the decision threshold. Simulation results show that the proposed method II can achieve an accuracy of 99.4% for query inputs both inside and outside the database. Overall, it is highly possible to use the proposed system for copyright control.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.