This paper presents comparative evaluations of 12 typical methods of estimating the fundamental frequency (F0) over huge speech-sound datasets in reverberant environments. They involve several classical algorithms such as cepstrum, AMDF, LPC, and autocorrelation methods. Other methods involve a few modern algorithms, i.e., instantaneous amplitude and/or frequency-based algorithms, such as TEMPO, IFHC, and PHIA. The comparative results revealed that the percentage of correct rates and SNRs of the estimated F0s were reduced drastically as reverberation time increased. This paper, thus, proposes a method of robustly and accurately estimating F0 in reverberant environments by utilizing the MTF concept and the source-filter model in complex cepstrum analysis. The MTF concept is used in this method to eliminate dominant reverberant characteristics from observed reverberant speech. The source-filter model is used to extract source information from the processed cepstrum. Finally, F0s are estimated from them by using the comb-filtering method. Additive-comparative evaluation was carried out on the proposed method and other typical methods. The results demonstrated that it was better than the previously reported methods in terms of robustness and in providing accurate F0 estimates in reverberant environments. [Work supported by a Grant-in-Aid for Science Research from the Japanese Ministry of Education No. 18680017.]
This paper reports comparative evaluations of the method we previously proposed of estimating fundamental frequency (F0) based on complex cepstrum analysis with nine typical methods over huge speech-sound datasets in both artificial and realistic reverberant environments (in room acoustics). They involve several classic algorithms (Cepstrum, AMDF, LPC, and modified autocorrelation) and a few modern algorithms (TEMPO, YIN, and PHIA). The comparative results revealed that the percentage correct rates of the estimated F0s using them were drastically reduced as the reverberation time increased while F0 estimated with the proposed method was completely robust and accurate. They also demonstrated that homomorphic analysis and the concept of a source-filter model were relatively effective for estimating F0. The results also demonstrated that it was much better than the previously reported methods in terms of robustness and providing accurate F0 estimates in both artificial and realistic reverberant environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.