Toward a real-time spoken language system using commercial hardware

Austin, S.; Peterson, Pat; Placeway, Paul; Schwartz, Richard; Vandergrift, Jeff

doi:10.3115/116580.116608

Cited by 16 publications

(5 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In other fields (e.g., telecommunications) there exist ad hoc solutions that allow one to tame the complexity by means of hardware implementations (Austin et al, 1990) and/or methods for approximating the optimum path (Fano, 1963). In learning systems, however, hardware implementations are seldom seen in practice, and suboptimal solutions are less interesting.…”

Section: Background and Overviewmentioning

confidence: 99%

CarpeDiem

Esposito¹,

Radicioni²

2007

Proceedings of the 24th International Conference on Machine Learning

View full text Add to dashboard Cite

In this paper we present a novel algorithm, CarpeDiem. It significantly improves on the time complexity of Viterbi algorithm, preserving the optimality of the result. This fact has consequences on Machine Learning systems that use Viterbi algorithm during learning or classification. We show how the algorithm applies to the Supervised Sequential Learning task and, in particular, to the HMPerceptron algorithm. We illustrate CarpeDiem in full details, and provide experimental results that support the proposed approach.

show abstract

Section: Background and Overviewmentioning

confidence: 99%

CarpeDiem

Esposito¹,

Radicioni²

2007

Proceedings of the 24th International Conference on Machine Learning

View full text Add to dashboard Cite

show abstract

“…As an example of this kind of approach, Fig. 1 shows a typical static network structure for word recognition using monophone models and a bigram-backoff language model [4].…”

Section: The Standard Viterbi Searchmentioning

confidence: 99%

A one pass decoder design for large vocabulary recognition

Odell

Valtchev

Woodland

et al. 1994

Proceedings of the Workshop on Human Language Technology - HLT '94

View full text Add to dashboard Cite

To achieve reasonable accuracy in large vocabulary speech recognition systems, it is important to use detailed acoustic models together with good long span language models. For example, in the Wall Street Journal (WSJ) task both cross-word triphones and a trigram language model are necessary to achieve state-of-the-art performance. However, when using these models, the size of a pre-compiled recognition network can make a standard Viterbi search infeasible and hence, either multiple-pass or asynchronous stack decoding schemes are typically used. In tl:fis paper, we show that timesynchronous one-pass decoding using cross-word triphones and a trigram language model can be implemented using a dynamically built tree-structured network. This approach avoids the compromises inherent in using fast-matches or preliminary passes and is relatively efficient in implementation. It was included in the HTK large vocabulary speech recognition system used for the 1993 ARPA WSJ evaluation and experimental results are presented for that task.

show abstract

“…The NL processing is performed strictly after the speech recognition, since competing for the same processor could not make it faster. (If two separate processors are available, the processing can be overlapped as described in [1]. )…”

Section: Real-time Atis Systemmentioning

confidence: 99%

“…Starting in 1990 [1] [2] we have taken a different approach based on modifying the algorithms to provide increased speed without loss in accuracy. Our goal has been to use commercially available off-the-shelf (COTS) hardware to perform speech recognition.…”

mentioning

confidence: 99%

BBN real-time speech recognition demonstrations

Austin¹,

Bobrow²,

Ellard³

et al. 1992

Proceedings of the Workshop on Speech and Natural Language - HLT '91

Self Cite

View full text Add to dashboard Cite

Typically, real-time speech recognition -if achieved at all -is accomplished either by greatly simplifying the processing to be done, or by the use of special-purpose hardware. Each of these approaches has obvious problems. The former results in a substantial loss in accuracy, while the latter often results in obsolete hardware being developed at great expense and delay. Starting in 1990 [1] [2] we have taken a different approach based on modifying the algorithms to provide increased speed without loss in accuracy. Our goal has been to use commercially available off-the-shelf (COTS) hardware to perform speech recognition. Initially, this meant using workstations with powerful but standard signal processing boards acting as accelerators. However, even these signal processing boards have two significant disadvantages:1. They often cost as much as the workstation they are plugged into.. The interface between each board and workstation is complicated, and always different for each combination of workstation and board.To make speech recognition available to a broad base of users at an affordable cost, we have eliminated these disadvantages by developing algorithms that are able to operate in real-time on COTS workstations without requiring additional add-on hardware and without decreasing recognition speed and accuracy. An additional advantage is that we are able to benefit from the improvements in workstation price and performance, with very minimal porting effort. The BBN RUBY TM system, a robust commercialization of the BYBLOS TM speech recognition technology, is the result of this development effort. At the workshop, we demonstrated two example systems that employ the RUBY speech recognition system.Both demonstrations run on Silicon Graphics workstations (Personal IRIS 4D/35 and Indigo), which contain a builtin programmable A/D-D/A. The signal processing and vector quantization, which runs in a separate process from the recognition search, communicates with the recognition search via network sockets. We have reduced the computation required for this front end processing to the point where it requires little enough of the CPU so that there is enough left over to perform the more expensive search in real time.Since accuracy is our primary concern, we have verified that this signal processing results in the same accuracy as our previous signal processing software. REAL-TIME ATIS SYSTEMThe ATIS demonstration integrated BBN's DELPHI natural language understanding system with the RUBY speech recognition component. RUBY is used as a black-box, controlled entirely through an application programmers interface (API). The natural language component is our current research system, which runs as a separate process. Both processes run on the same processor, although not at the same time. The NL processing is performed strictly after the speech recognition, since competing for the same processor could not make it faster. (If two separate processors are available, the processing can be overlapped as describedThe speech recognit...

show abstract

Toward a real-time spoken language system using commercial hardware

Cited by 16 publications

References 6 publications

CarpeDiem

CarpeDiem

A one pass decoder design for large vocabulary recognition

BBN real-time speech recognition demonstrations

Contact Info

Product

Resources

About