Vector quantization of pitch information in Mandarin speech

Chen, Sin‐Horng; Wang, Yih-Ru

doi:10.1109/26.61370

Cited by 76 publications

(3 citation statements)

References 4 publications

(2 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…is the observed log-F0 contour of the n-th syllable of an N-syllable word and is represented by the first four orthogonally-transformed parameters [4]; -1 -1…”

Section: Syllable F0 Contour Modelmentioning

confidence: 99%

Prosodic Modeling for Isolated Mandarin Words and Its Application

Shih,

Chiang,

Wang

et al. 2008

Int. Symp. On Chinese Spoken Language Processing

View full text Add to dashboard Cite

In this paper, a new approach to syllable-based modeling of F0 contour, duration and energy for isolated Mandarin words is proposed. The syllable F0 contour model considers three major affecting factors, including lexical tone, syllable position in a word and inter-syllable coarticulation effect; while both the duration and energy models additionally consider one more affecting factor of base syllable type. Experimental results on a large single-speaker database showed that the method performed very well. Based on the prosodic model, a learning system for Mandarin word prosody pronunciation is designed and implemented for nonnative speakers.

show abstract

“…is the observed log-F0 contour of the n-th syllable of an N-syllable word and is represented by the first four orthogonally-transformed parameters [4]; -1 -1…”

Section: Syllable F0 Contour Modelmentioning

confidence: 99%

Prosodic Modeling for Isolated Mandarin Words and Its Application

Shih,

Chiang,

Wang

et al. 2008

Int. Symp. On Chinese Spoken Language Processing

View full text Add to dashboard Cite

show abstract

“…The example features used include the average value of the pitch within the syllable, the average of the absolute value of the pitch slope within the syllable, the range of the pitch within the syllable, the pitch reset across the boundary, and so on. In order to represent the shape of the pitch contour within a syllable, we also used the first four coefficients of the Legendre discrete polynomial expansion of the contour [8], for which the zero-th order coefficient represents the level of the contour, and the other three coefficients represent the key characteristics of the contour shape. A total of 16 pitch-related attributes were used here for each syllable boundary.…”

Section: Prosodic Featuresmentioning

confidence: 99%

Improved large vocabulary Mandarin speech recognition using prosodic features

Huang,

Lee

2006

Speech Prosody 2006

View full text Add to dashboard Cite

This paper presents a new framework for improved large vocabulary Mandarin speech recognition using prosodic features. The prosodic information is formulated in a probabilistic model well compatible to the conventional maximum a posteriori (MAP) framework for large vocabulary speech recognition. A set of prosodic features considering the special characteristics of Mandarin Chinese is developed, and both syllable-level and prosodic-word-level prosodic models are trained with the decision tree algorithm. A two-pass recognition process is used, in which each word arc in the word graph output by the first pass is rescored in the second pass using the two prosodic models. The experiments show the reasonable improvements in recognition accuracy. This approach does NOT require a prosodic labeled training corpus, and works for the large-scale speaker-independent task.

show abstract

“…For example, predictive quantization approach was used in [1], and in [2] the pitch values were coded using a shaped lattice quantizer. In [3], the exploitation of redundancies was taken one step further: the pitch information was orthogonally transformed and vector quantized by taking into account the tonal nature of Mandarin speech.…”

Section: Introductionmentioning

confidence: 99%

Efficient technique for quantization of pitch contours

Nurminen,

Himanen,

Rämö

2006

Speech Prosody 2006

View full text Add to dashboard Cite

This paper introduces an efficient technique for pitch contour quantization designed mainly for applications that require storage of speech or prosodic information at a high compression ratio. Instead of quantizing the estimated pitch values directly, the proposed technique forms and quantizes a simplified model of the pitch contour. The simplified contour is constructed in such a manner that the amount of information needed for describing it is minimized. At the same time, the deviation from the original contour is maintained below a predetermined limit. In addition to the high compression ratio, the contour representation offers benefits in pitch-synchronous decoding. The proposed technique is implemented and evaluated in a practical storage speech coder. According to the evaluation, the performance of the quantization technique is very promising as it achieves perceptually satisfactory quality at an average bit rate of about 100 bits per second.

show abstract

Vector quantization of pitch information in Mandarin speech

Abstract: By taking advantage of the simple tone structure of pitch contours in Mandarin speech, pitch information is orthogonally transformed and vector quantized. An average bit rate of 0.78 bits/frame (34.67 hits/s) for voiced sounds was achieved.

Cited by 76 publications

References 4 publications

Prosodic Modeling for Isolated Mandarin Words and Its Application

Prosodic Modeling for Isolated Mandarin Words and Its Application

Improved large vocabulary Mandarin speech recognition using prosodic features

Efficient technique for quantization of pitch contours

Contact Info

Product

Resources

About