1999
DOI: 10.1109/49.743698
|View full text |Cite
|
Sign up to set email alerts
|

Quantization of cepstral parameters for speech recognition over the World Wide Web

Abstract: We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web. We compare a server-only processing model, where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Internet. We follow a novel encoding paradigm, trying to maximize recognition performance instead of perceptual reproduction… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
30
0

Year Published

2000
2000
2016
2016

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(33 citation statements)
references
References 10 publications
3
30
0
Order By: Relevance
“…From the collected voice samples, it will identify the user moods. A client-server based speech recognition system is described in [8]. In this work, recognition is done at the server side.…”
Section: Literature Surveymentioning
confidence: 99%
“…From the collected voice samples, it will identify the user moods. A client-server based speech recognition system is described in [8]. In this work, recognition is done at the server side.…”
Section: Literature Surveymentioning
confidence: 99%
“…We have chosen this method over the WER-based greedy algorithm of Digalakis et al (1999) because of its computational simplicity and this allows us to scale any bitrate with ease. Table 5 shows the average recognition accuracy of the non-uniform scalar quantiser.…”
Section: Comparison With the Recognition Performance Of The Non-unifomentioning
confidence: 99%
“…Digalakis et al (1999) evaluated the use of uniform and nonuniform scalar quantisers as well as product code vector quantisers for compressing Mel frequencywarped cepstral coefficients (MFCCs) between 1.2 and 10.4 kbps. They concluded that split vector quantisers achieved word error rates (WER) similar to that of scalar quantisers while requiring less bits.…”
Section: Introductionmentioning
confidence: 99%
“…The effect of various speech coding techniques on speech recognition, including GSM (Digalakis et al, 1999;Srinivasamurthy et al, 2000;Kiss, 2000;Lilly and Paliwal, 1996;Srinivasamurthy et al, 2001b), G.723.1, G.727, G.728, G.729 (Turunen and Vlaj, 2001) (Lilly and Paliwal, 1996) and MELP (Srinivasamurthy et al, 2000;Srinivasamurthy et al, 2001b), has been previously evaluated by a number of researchers. In all cases, it was shown that speech coding significantly degrades speech recognition performance.…”
Section: Introductionmentioning
confidence: 96%
“…One such clientserver system, shown in Fig. 1, is distributed speech recognition (DSR) (Digalakis et al, 1999), where the client contains the feature extractor (the computation requirements of feature extraction are almost the same as those of a vocoder based speech encoder). Only the extracted features are encoded and transmitted to the server with the speech recognizer which operates on the decoded feature data.…”
Section: Introductionmentioning
confidence: 99%