This textbook, available in two volumes, has been developed from a course taught at Harvard over the last decade. The course covers principally the theory and physical applications of linear algebra and of the calculus of several variables, particularly the exterior calculus. The authors adopt the 'spiral method' of teaching, covering the same topic several times at increasing levels of sophistication and range of application. Thus the reader develops a deep, intuitive understanding of the subject as a whole, and an appreciation of the natural progression of ideas. Topics covered include many items previously dealt with at a much more advanced level, such as algebraic topology (introduced via the analysis of electrical networks), exterior calculus, Lie derivatives, and star operators (which are applied to Maxwell's equations and optics). This then is a text which breaks new ground in presenting and applying sophisticated mathematics in an elementary setting. Any student, interpreted in the widest sense, with an interest in physics and mathematics, will gain from its study.
For large-vocabulary continuous speech recognition, the goal of training is to model phonemes with enough precision so that from the models one could reconstruct a sequence of acoustic parameters that accurately represents the spectral characteristics of any naturally-occurring sentence, including all coarticuladon effects that arise either between phonemes in a word or across word boundaries. The aim at Dragon Systems is to collect and process enough training data to accomplish this goal for all of natural spoken English rather than for any one restricted task. The basic unit that must be trained is the "phoneme in context" (PIC), a sequence of three phonemes accompanied by a code for prepausal lengthening. At present, syllable and word boundaries are ignored in defining PICs. More than 16,000 training tokens, half isolated words and half short phrases, were phonemically labeled by a semi-. automatic procedure using hidden Markov models. To model a phoneme in a specific context, a weighted average is constructed from training data involving the desired context and acoustically similar contexts. For use in HMM continuous-speech recognition, each PIC is converted to a Markov model that is a concatenation of one to six node models. No phoneme, in all its contexts, requires more than 64 distinct nodes, and the total number of node models ("phonemic segments") required to construct all PICs is only slightly more than 2000. As a result, the entire set of PICs can be adapted to a new speaker on the basis of a couple of thousand isolated words or a few hundred sentences of connected speech. The advantage of this approach to training is that it is not task-specific. From a single training database, Dragon Systems has constructed models for use in a 30,000-word isolated-word recognizer, for connected digits, and for two different thousand-word continuous-speech tasks.
In this paper we present some of the algorithm improvements that have been made to Dragon's continuous speech recognition and training prograxns, improvements that have more than halved our error rate on the Resource Management task since the last SLS meeting in February 1991. We also report the "dry run" results that we have obtMned on the 5000-word speaker-dependent Wall Street Journal recognition task, and outline our overall research strategy and plans for the future.In our system, a set of output distributions, known as the set of PELs (phonetic elements), is associated with each phoneme. The HMM for a PIC (phoneme-in-context) is represented as a linear sequence of states, each having an output distribution chosen from the set of PELs for the given phoneme, and a (double exponential) duration distribution.In this paper we report on two methods of acoustic modeling and tr~ning. The first method involves generating a set of (unimodal) PELs for a given speaker by clustering the hypothetical frames found in the spectral models for that speaker, and then constructing speaker-dependent PEL sequences to represent each PIC. The "spectral model" for a PIC is simply the expected value of the sequence of frames that would be generated by the PIC. The second method represents the probability distribution for each parameter in a PEL as a mixture of a fixed set of unimodal components, the mixing weights being estimated using the EM algorithm. In both models we assume that the parameters axe statistically independent.We report results obtained using each of these two methods (RePELing/Respelling and univariate "tied mixtures") on the 5000-word closed-vocabulary verbalized punctuation version of the Wall Street Journal task.
We present a 1000-word continuous speech recognition (CSR) system that operates in real time on a personal computer (PC). The system, designed for large vocabulary natural language tasks, makes use of phonetic Hidden Markov models (HMM) and incorporates acoustic, phonetic, and linguistic sources of knowledge to achieve high recognition performance.We describe the various components of this system. We also present our strategy for achieving real time recognition on the PC. Using a 486based PC with a 29K-based add-on board, the recognizer has been timed at 1.1 times real time.
In this paper we present preliminary results obtained at Dragon Systems on the Resource Maaaagernent benchmark task. The basic conceptual units of our system are Phonemes-m-Context (PICs), which are represented as Hidden Mmkov Models, each of which is eapressed as a sequence of Phonetic Elements (PELs). The PELs corresponding to a given phoneme constitute a kind of alphabet for the representation of PICs.For the speaker-dependent tests, two basic methods of training the acoustic models were investigated. 'nac first method of training the Resouro~ Managemera models is to ~e-estimate the models for each test speaker from that speaker's training data, keeping the PEL spellings of the PICs fixed. The second approach is to use the re-estimated models from the first melhod to derive a segmentation of the training data, then to respall the PICs in a hrgely speaker-depmdmt manner in order to improve the representation of speaker differences. A full explanation of these methods is given, as are results using each method.In addition to repotting on two different training slrategies, we disoass NBest results. The N-Best algorithm is a modification of the algorithm proposed by Soong and Huang at the Jtme 1990 workshop. This algorithm runs as a post-processing step and uses an A*-search (an algorithm also known as a 'stack decoder').
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.