The RWTH 2007 TC-STAR evaluation system for european English and Spanish

Lööf, Jonas; Gollan, Christian; Hahn, Stefan; Heigold, Georg; Hoffmeister, Björn; Plahl, Christian; Rybach, David; Schlüter, Ralf; Ney, Hermann

doi:10.21437/interspeech.2007-579

Cited by 25 publications

(9 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Using a baseline 1-pass 60k-words recognition system for the automatic transcription of European Parliament Plenary Sessions (EPPS) in English as described in [4], a recognition quality of about 15% word error rate is achievable with a real-time factor of ∼4 (see Figure 2) on the evaluation corpus of the 2007 TC-STAR Evaluation Campaign. This corpus consists of 2.9h of speech with an out-of-vocabulary rate of 1.1%.…”

Section: Resultsmentioning

confidence: 99%

“…The applicability of our toolkit to real-life tasks has been proven by building several large vocabulary systems in recent international research projects. The European English and Spanish recognition systems developed during the TC-STAR Project are based on our RWTH ASR toolkit [4]. These two systems achieved the best results in the 2007 TC-STAR Evaluation Campaign.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The RWTH aachen university open source speech recognition system

et al. 2009

Self Cite

View full text Add to dashboard Cite

We announce the public availability of the RWTH Aachen University speech recognition toolkit. The toolkit includes state of the art speech recognition technology for acoustic model training and decoding. Speaker adaptation, speaker adaptive training, unsupervised training, a finite state automata library, and an efficient tree search decoder are notable components. Comprehensive documentation, example setups for training and recognition, and a tutorial are provided to support newcomers.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

The RWTH aachen university open source speech recognition system

et al. 2009

Self Cite

View full text Add to dashboard Cite

show abstract

“…Experiments were conducted based on the European Parliament plenary sessions (EPPS) corpus, from the TC-STAR project [3].…”

Section: Methodsmentioning

confidence: 99%

“…In order to successfully use log-linear modeling in a stateof-the-art speech recognition system, it is necessary to reproduce or replace all important methods used in improving such a system. Speaker adaptation is one important method to improve the performance of a speech recognition system, and especially the use of feature space maximum likelihood linear regression (fMLLR) speaker adaptive training (SAT) [2], has proved to be an important part of state-of-the-art systems [3]. Thus, it is important to develop and investigate adaptation methods for loglinear models if they are to replace Gaussian models in a stateof-the-art speech recognition system.…”

Section: Introductionmentioning

confidence: 99%

Discriminative adaptation for log-linear acoustic models

2010

Self Cite

View full text Add to dashboard Cite

“…questions on central-phoneme, segment, context, position or alternative properties are permitted to be posed in any desired order, and (2) the number of CART leafs, and thus, the tied CD-HMM state emission models, become controllable by keeping the n-best variants instead of solely relying on the construction criteria minimum log-likelihood gain τ ∆ log L and minimum observations τ N S(m) . The single joint CART is fundamental to LVCSR systems in [Lööf et al, 2007, and applied throughout the experimentation described in the subsequent chapters of this thesis.…”

Section: Hmm State Tyingmentioning

confidence: 99%

Large vocabulary continuous speech recognition for the transcription of Catalan broadcast news and conversations : towards analysis and modelling of acoustic reduction in spontaneous speech

Schulz

View full text Add to dashboard Cite

The transcription of spontaneous speech still poses a challenge to state-of-the-art methods for automatic speech recognition. The present thesis describes the comprehensive development of a large vocabulary continuous speech recognition system for the transcription of Catalan broadcast news and conversions and evolves towards novel approaches for analysis and modelling of acoustic reduction in spontaneous speech. It emphasises initially on various conventional methods for acoustic analysis, acoustic and language modelling and hypothesis search. Improvements over the original single-pass baseline system are mainly attained by domain and speaking style emphasising interpolation of individually estimated language models, linear discriminating projection of acoustic observations that improves the phonetic class separability, speaker normalisation of the acoustic observations, speaker adaptive training and acoustic model adaptation in a multi-pass system approach. The analysis of acoustic reduction initially emphasises on context independent vowel and consonant specific spectral and temporal properties whose parameters display statistically significant differences between the phoneme prototypes in spontaneous speech and their canonical realisations in planned speech. The introduction of the feature space analysis provides the general means to reveal these differences in conventional acoustic observations for automatic speech recognition. It displays statistically significant differences context-independently but also in a syllable context between adjacent phonemes suggesting particular reduction patterns. The analysis furthermore challenges the often suggested coherence between the co-occurring reduction of spectral and temporal properties. The modelling of acoustic reduction first emphasises on segment conditioned discriminating variables and variability class dependent models and variability class specific adaptation of the original acoustic model. It introduces phoneme rate as means to analyse temporal properties and feature space reduction ratio as means to analyse the reduction of spectral properties in conventional feature space for large vocabulary continuous speech recognition as discriminating variables. These variables are clustered and determine the classes for segment conditioned variability class dependent models and their scoring during the hypothesis search in recognition. Both approaches displays no significant performance improvement. Furthermore the modelling advances towards segment constituent predictability dependent models that introduce predictability as discriminating variable for variability class dependent models relying on the fundamental coherence between predictability and acoustic reduction that is suggested through the principle of least effort and the redundancy theory. It thereby emphasises on word and phoneme predictability. This approach displays no significant performance improvement. Planned speech is apparently antagonising the principle of least effort. Thus, a prior segment conditioned analysis of acoustic reduction may indicate its average degree of reduction, while their within-segment variation may indicate whether it exhibits sufficient relaxation of the speaking style to adopt the principle of least effort. Thus, segments exhibiting small within-segment variation may be modelled separately from those of large within-segment variation, whereas modelling the latter by word, syllable or phoneme predictability dependent models may provide a research perspective. La transcripció de converses espontànies encara suposa un repte per als mètodes actuals de reconeixement automàtic de veu. Aquesta tesi descriu el desenvolupament d'un sistema de reconeixement de veu continu de vocabulari gran per a la transcripció de converses i notícies emeses en català i condueix cap a noves aproximacions per a l'anàlisi i modelat de la reducció acústica en converses espontànies. Es centra inicialment en diversos mètodes convencionals per a l'anàlisi acústica, modelat acústic i del llenguatge i en la cerca d'hipòtesis. Les millores respecte el sistema original d'única passada són principalment degudes al domini i l'estil en la parla posant èmfasi en la interpolació de models de llenguatge, discriminació lineal i projecció d'observacions acústiques, entrenament adaptat al locutor per millorar la separació de les classes fonètiques, normalització de les observacions acústiques, i adaptació del model acústic en una sistema de múltiples passades. L'anàlisi de reducció acústica posa inicialment èmfasi en les propietats espectrals i temporals independents de vocals i consonant específiques, els paràmetres de les quals mostren diferències estadísticament significatives entre els prototips de fonemes en la conversa espontània i la seva realització canònica en el discurs planejat. La introducció de l'anàlisi del espai de característiques proporciona els mitjans generals per a revelar aquestes diferències en observacions acústiques convencionals per al reconeixement automàtic de veu. Mostra diferències estadísticament significatives independents de context però també entre fonemes adjacents en el context de síl·laba suggerint patrons de reducció particulars. A més, l'anàlisi desafia la, sovint suggerida, coherència entre les reducció simultànies de les propietats espectrals i temporals. El modelat de la reducció acústica primer fa èmfasi en variables discriminants de cada segment, models dependents de la variabilitat de la classe i l'adaptació del model acústic original. Introdueix la taxa de fonemes com a mitjà d'analitzar propietats temporals i la proporció de la reducció del espai de característiques com a mitjà d'analitzar la reducció dels propietats espectrals en el espai de característiques convencional per al reconeixement de veu continu de vocabulari gran com a variables discriminants. Aquestes variables s'agrupen i determinen les classes per a models dependents de la variabilitat de cada segment i la seva puntuació durant el reconeixement i cerca d'hipòtesi. Ambdues aproximacions no mostren una millora significativa en el rendiment. A més a més, les tècniques de modelat es dirigeixen cap a models dependents de la predicibilitat del segment que introdueixen la predicibilitat com a variable discriminant per a models dependents de la classe de variabilitat basats en la coherència fonamental entre predicibilitat i reducció acústica que es suggereix pel principi del mínim esforç i la teoria de la redundància. Per tant, emfatitza la predicibilitat de les paraules i dels fonemes. Aquesta aproximació no suposa cap millora significativa de rendiment. El discurs planejat és aparentment antagònic amb el principi del mínim esforç. Per tant, un anàlisi previ condicionat al segment de la reducció acústica pot indicar el seu grau mig de reducció, mentre la variació intra-segmental pot indicar si exhibeix prou relaxació en l'estil de parlar per adoptar el principi del mínim esforç. Per tant, segments amb poca variació intra-segmental poden ser modelats apart dels que tenen gran variació intra-segmental, mentre que modelar aquestes darreres mitjançant models dependents de predicibilitat de paraula, síl·laba o fonema poden aportar una perspectiva viable de recerca.

show abstract

The RWTH 2007 TC-STAR evaluation system for european English and Spanish

Cited by 25 publications

References 13 publications

The RWTH aachen university open source speech recognition system

The RWTH aachen university open source speech recognition system

Discriminative adaptation for log-linear acoustic models

Large vocabulary continuous speech recognition for the transcription of Catalan broadcast news and conversations : towards analysis and modelling of acoustic reduction in spontaneous speech

Contact Info

Product

Resources

About