Speech Structure and Its Application to Robust Speech Processing

Minematsu, Nobuaki; Asakawa, Susumu; Suzuki, Masayuki; Qiao, Yu

doi:10.1007/s00354-009-0091-y

Cited by 21 publications

(36 citation statements)

References 26 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Minematsu et al proposed a new method of representing speech, called speech structure, and proved that the acoustic variations, corresponding to any linear transformation in the cepstrum domain, can be effectively unseen in the representation [9]. This invariance is due to the invariance of the Bhattacharyya distance (BD), which is calculated using equation 2 and is proved to be invariant with any linear transform.…”

Section: Invariant Pronunciation Structurementioning

confidence: 99%

“…The BD is calculated from any pair of distributions and the resulting full set of the BDs forms an invariant distance matrix. This ma- Fig.6 Speaker-independent pronunciation structure Fig.7 Inter-speaker structure difference [12] trix-based representation of an utterance is called pronunciation structure [9]. The structure only represents the local and global contrastive aspects of a given utterance, which is theoretically similar to Jakobson's structural phonology [10].…”

Section: Invariant Pronunciation Structurementioning

confidence: 99%

“…We also consider that our system can become more comparable to the perfect recognizer by tuning input features and regression methods. For features, we can use Multiple Stream Structuralization (MSS) [9] and, as discussed in [12], use of absolute features in addition to contrast (relational) features will also be effective to improve the performance. For regression, we're interested in applying kNN-SVR [25] to our task.…”

Section: Svr To Predict Pronunciation Dis-tances Among Speakersmentioning

confidence: 99%

“…To this end, we use pronunciation structure analysis for feature extraction and we also use support vector regression for distance prediction. The invariant structure analysis was proposed in [8,9] inspired by Jakobson's structural phonology [10] and it can extract invariant and robust features. The structural features were already introduced to various tasks such as pronunciation scoring [11,12], pronunciation error detection [13], language learners clustering [14], dialect analysis [15], and automatic speech recognition [16,17,18].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Automatic pronunciation clustering using a World English archive and pronunciation structure analysis

Shen

Minematsu

Makino

et al. 2013

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

View full text Add to dashboard Cite

English is the only language available for global communication. Due to the influence of speakers' mother tongue, however, those from different regions inevitably have different accents in their pronunciation of English. The ultimate goal of our project is creating a global pronunciation map of World Englishes on an individual basis, for speakers to use to locate similar English pronunciations. If the speaker is a learner, he can also know how his pronunciation compares to other varieties. Creating the map mathematically requires a matrix of pronunciation distances among all the speakers considered. This paper investigates invariant pronunciation structure analysis and Support Vector Regression (SVR) to predict the inter-speaker pronunciation distances. In experiments, the Speech Accent Archive (SAA), which contains speech data of worldwide accented English, is used as training and testing samples. IPA narrow transcriptions in the archive are used to prepare reference pronunciation distances, which are then predicted based on structural analysis and SVR, not with IPA transcriptions. Correlation between the reference distances and the predicted distances is calculated. Experimental results show very promising results and our proposed method outperforms by far a baseline system developed using an HMM-based phoneme recognizer.

show abstract

Section: Invariant Pronunciation Structurementioning

confidence: 99%

Section: Invariant Pronunciation Structurementioning

confidence: 99%

Section: Svr To Predict Pronunciation Dis-tances Among Speakersmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Automatic pronunciation clustering using a World English archive and pronunciation structure analysis

Shen

Minematsu

Makino

et al. 2013

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

View full text Add to dashboard Cite

show abstract

“…Recently, a novel structural model of pronunciation was proposed [6], which works effectively to remove the nonlinguistic aspects of speech from speech acoustics and keep the linguistic aspects well at the same time. Since the nonlinguistic change of speech features is often modeled as feature transformation, the novel model is based on completely transform-invariant features, which is f-divergence [7].…”

Section: Introductionmentioning

confidence: 99%

Automatic Chinese pronunciation error detection using SVM trained with structural features

Zhao

Hoshino

Suzuki

et al. 2012

2012 IEEE Spoken Language Technology Workshop (SLT)

Self Cite

View full text Add to dashboard Cite

Pronunciation errors are often made by learners of a foreign language. To build a Computer-Assisted Language Learning (CALL) system to support them, automatic error detection is essential. In this study, Japanese learners of Chinese are focused on. We investigated in automatic detection of their typical and frequent phoneme production errors. For this aim, four databases are newly created and we propose a detection method using Support Vector Machine (SVM) with structural features. The proposed method is compared to two baseline methods of Goodness Of Pronunciation (GOP) and Likelihood Ratio (LR) under the task of phoneme error detection. Experiments show that the proposed method performs much better than both of the two baseline methods. For example, the false rejection rate is reduced by as much as 82%. However, the results also indicate some drawbacks of using SVM with structural features. In this paper, we discuss merits and demerits of the proposed method and in what kind of real applications it works effectively.

show abstract

Discriminative re-ranking for automatic speech recognition by leveraging invariant structures

Suzuki

Kurata

Nishimura

et al. 2015

Speech Communication

Self Cite

View full text Add to dashboard Cite

Speech Structure and Its Application to Robust Speech Processing

Cited by 21 publications

References 26 publications

Automatic pronunciation clustering using a World English archive and pronunciation structure analysis

Automatic pronunciation clustering using a World English archive and pronunciation structure analysis

Automatic Chinese pronunciation error detection using SVM trained with structural features

Discriminative re-ranking for automatic speech recognition by leveraging invariant structures

Contact Info

Product

Resources

About