Lee Hetherington scite author profile

In this paper we report on our recent efforts to collect a corpus of spoken lecture material that will enable research directed towards fast, accurate, and easy access to lecture content. Thus far, we have collected a corpus of 270 hours of speech from a variety of undergraduate courses and seminars. We report on an initial analysis of the spontaneous speech phenomena present in these data and the vocabulary usage patterns across three courses. Finally, we examine language model perplexities trained from written and spoken materials, and describe an initial recognition experiment on one course.

show abstract

Baum-Welch training for segment-based speech recognition

Han

Hetherington

Glass

View full text Add to dashboard Cite

The use of segment-based features and segmentation networks in a segment-based speech recognizer complicates the probabilistic modeling because it alters the sample space of all possible segmentation paths and the feature observation space. This paper describes a novel Baum-Welch training algorithm for segment-based speech recognition which addresses these issues by an innovative use of finite-state transducers. This procedure has the desirable property of not requiring initial seed models that were needed by the Viterbi training procedure we have used previously. On the PhoneBook telephone-based corpus of read, isolated words, the Baum-Welch training algorithm obtained a relative error reduction of 37% on the training set and a relative error reduction of 5% on the test set, compared to Viterbi trained models. When combined with a duration model, and more flexible segmentation network, the Baum-Welch trained models obtain an overall word error rate of 7.6%, which is the best result we have seen published for the 8,000 word task.

show abstract

SAPPHIRE: an extensible speech analysis and recognition tool based on Tcl/Tk

Hetherington¹,

McCandless²

View full text Add to dashboard Cite

The SAPPHIRE system is a powerful, extensible, object-oriented toolkit allowing researchers to rapidly build and configure customized speech analysis tools. Implemented in Tcl/Tk and C, the current version of SAPPHIRE provides a wide range of functionality, including the ability to configure and run the SUMMIT speech recognition system. We now use SAPPHIRE widely in almost all aspects of our speech analysis and recognition research. MOTIVATIONInvesting in the development of tools and other forms of infrastructure is a critical role that all research institutions must periodically undertake. Unfortunately, such an investment is time-consuming, and the benefits are not always apparent nor readily measurable. As a result, one can easily be so distracted by day-to-day affairs that such activities are indefinitely deferred.Powerful tools are particularly necessary for research and development of technologies involving human language, since the processes of signal modelling, algorithm development, and system evaluation rely largely on empirical evidence. The situation is confounded by the fact that, more often than not, research activities involve not a single individual but a team of collaborators, each responsible for a specific aspect of the problem. A comprehensive set of research tools will greatly enhance researchers' ability to collectively explore, formulate, and test new ideas with minimum effort.There are several existing tools geared towards the development of human language technologies. In the mid-eighties our group developed SPIRE [1], a set of Lisp-machine based signal analysis and display tools that enjoyed limited distribution. Another speech-related tool is the ISP package designed by Kopec [2]. Perhaps the most widely used is the signal processing and display package from Entropic, ESPS/Waves [3]. Recently, the CSLU at OGI has also developed and disseminated a set of tools, based on Tcl/Tk, that allow researchers to rapidly configure a dialogue-based speech interface [4].Over the past few years, our group has been actively developing human language technologies, and embedding these technologies in conversational systems. In this process, we have become increasingly aware of the multiple roles played by tools, and the important 1 This research was supported by DARPA under contract N66001-94-C-6040, monitored though Naval Command, Control and Ocean Surveillance Center.properties that they must possess. First, the tools must be intuitive. A toolkit that requires significant training on the part of the user will not likely gain wide usage by people with varying computing skills, and will, in all likelihood, have minimal impact. Second, they must be comprehensive. Since the life cycle of research and development involves exploratory studies, signal modelling, algorithm development, system evaluation, and error analysis, the toolkit must support all these activities and more. Third, they must be customizable. For example, a user should be able to change the computing and display parameters easily ...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.