We address the problem of increasing the intelligibility of a 300 b/s segment vocoder by investigating: 1) new LSP-based distance measures and 2) new structures and construction methods for segment codebooks. We evaluate a variety of new distance measures and find that, after tuning, all of the distance measures provide almost equal intelligibility, indicating that some other factor, such as codebook template quality, is limiting performance. In an effort to improve the codebook, we examine multiple duration-dependent codebooks constructed by selecting phonetically-labelled segments from the TIMIT database.
We evaluated large-vocabulary continuous-speech recognizer performance as a function of recognizer tuning parameters for 4 recognition task domains (location, date, time, yes/no) and two different applications (e.g. over-the-telephone reservations) that had some task domains in common. After defining a cost function that included false reject, false accept, and misrecognition errors, we determined optimum parameter values for each domain. The optimum parameter settings differed significantly across domains and even across applications for the same domain. Using a single set of parameter values for all of the tasks in an application can lead to substantial cost penalties for some individual tasks. These results suggest that there can be substantial benefit in using task-specific tuned recognition parameters. We describe a methodology and set of supporting tools for efficiently performing taskspecific tuning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.