Chieko Furuichi scite author profile

The spectral envelope of speech can be represented efficiently by the log magnitude spectrum on the nonlinear frequency scale, which is close to mel scale (called mel‐log spectrum). the mel cepstrum defined by its Fourier coefficients is also considered to have a suitable property as the parameter to represent the spectral envelope. So far, no satisfactory filter has been reported for the synthesis approximating the mel‐log spectrum. This paper presents a method of constructing the mel‐log spectrum approximation (MLSA) filter, which has a relatively simple structure and a low coefficient sensitivity, together with a design example of MLSA filter for speech synthesis. the transfer function of MLSA filter is represented by Padé approximation, which approximates the exponential of the transfer function of the filter (so‐called basic filter). Since the transfer function of the basic filter is represented by a polynomial with the transfer function of the first‐order all‐pass filter as the variable, it is necessary in the realization of the filter to delete from the feedback loop the path without a delay. By the construction method of MLSA filter shown in this paper, the path without delay can easily be deleted from the feedback loop in the MLSA filter. the obtained MLSA filter is of relatively simple structure and has low coefficient sensitivity. the quantization characteristics of the coefficient are also satisfactory.

show abstract

Phonemic segmentation for continuous Mandarin speech recognition.

Imai

Furuichi

1997

J. Acoust. Soc. Jpn. (E), J Acoust Soc Jpn E

View full text Add to dashboard Cite

An algorithm is proposed in this paper for phonemic segmentation to improve the performance of a continuous Mandarin speech recognition systems. The coefficient of time variation of spectral envelope and the coefficient of tirrie.variation of zero order cepstrum are extracted using Unbiased Estimation of Log Spectrum (UELS). The parameter curves based on these coefficients are very smooth, therefore, the relation between parameter's maximum values and phoneme boundaries are easy to be found. By these smooth curves, the maximum value can be used as a criterion to delimit phonemes, rather than the threshold that is used in conventional systems, hence it is possible to get precise segmentation results . 300 sentences were used for an experiment, and the results show the system performance is better than traditional methods. The average phoneme-deletion rate is 1.3%, average phonemeinsertion rate is 3%. For evaluation, the segmentation results were used for a phoneme recognition experiment. 95.5% consonants recognition rate and 92.5% vowel recognition rate were obtained. The results show the approach is highly effective.

show abstract

Speech recognition using stochastic phonemic segment model based on phoneme segmentation

2000

View full text Add to dashboard Cite

This paper discusses speech recognition based on a new statistical phoneme segment model which is trained by phoneme parameters derived from automatically extracted phoneme segments. The proposed system operates as follows. In preprocessing before recognition, the phoneme boundaries are detected by segmentation. The phonemes are discriminated using a stochastic phoneme segment model, and a phoneme segment lattice with scores is constructed. Next the speech recognition is performed by matching of symbol sequences to dictionary items. The segmentation system that is employed can infer phoneme boundaries with high accuracy. This helps to eliminate unnecessary parameters, leaving the feature parameters which are effective in separating phonemes. In other words, the phoneme recognition problem in continuous speech can be reduced to a discrimination problem and thus a speakerindependent model can be constructed from a relatively small number of training data. The stochastic phoneme segment model is trained with training samples extracted from a phoneme-balanced word set of 4920 words uttered by 10 speakers. In a recognition experiment with 6709 words uttered by 63 nontraining speakers, a recognition rate of 92.6% was obtained as the average for all speakers, using a word dictionary of 212 words. © 2000 Scripta Technica, Syst Comp Jpn, 31(10): 8998, 2000

show abstract

Use of static/dynamic parameters in automatic phonemic segmentation system for english continuous speech

Furuichi

Aizawa

Imai

1996

Electron Comm Jpn Pt III

View full text Add to dashboard Cite

In this paper, a phonemic segmentation system is constructed which will be useful in practice as a preprocessing for English continuous speech recognition. The usefulness of the system is demonstrated. English continuous speech contains many weakly voiced vowels with ambiguous utterances called schwa, making stable detection of the phonemic boundary difficult. In fact, there has been almost no detailed proposal for a segmentation system that can be used in practice as a preprocessing to the recognition.In the proposed method, the me1 cepstrum is extracted from the speech signal, using the unbiased estimation of the log spectrum. This process is known to be stable, i.e., less affected by the h e spectral structure. Then the dynamic segmentation parameters are determined from the me1 cepstrum such that the boundary of the schwa phonemes can be detected using the pseudodifferentiation filter. By combining the dynamic parameters with the static parameters, the phonemewise segmentation is executed in a hierarchical form.The proposed system can segment English continuous speech containing diversified phonemic environments into phonemes on the time axis. The system operates based only on the acoustic knowledge of phonemes, which are common to speakers with different qualities and utterances, without using the speakerdependent complex boundary detection rules. Also, the proposed system is evaiuated by experiment using English continuous phoneme-balanced speech uttered by one each native English-spealung male and female for 350 s. For the total number of phonemes 3024, the detect ratio for the phoneme boundary is 97.1 percent, the boundary delete ratio is 2.9 percent, and the boundary add ratio is 24.2 percent.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chieko Furuichi

Mel Log Spectrum Approximation (MLSA) filter for speech synthesis

Phonemic segmentation for continuous Mandarin speech recognition.

Speech recognition using stochastic phonemic segment model based on phoneme segmentation

Use of static/dynamic parameters in automatic phonemic segmentation system for english continuous speech

Contact Info

Product

Resources

About