Speech Recognition System Combination for Machine Translation

Gales, Mark J. F.; Liu, X.; Sinha, Rohit; Woodland, Philip C.; Yu, Kai; Matsoukas, Spyros; Ng, Tsz‐Wai; Nguyen, Kham; Nguyen, Long; Gauvain, Jean‐Luc; Lamel, Lori; Messaoudi, Abdelkhalek

doi:10.1109/icassp.2007.367310

Cited by 9 publications

(9 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the system combination yields slightly better results. These results differ from those in [21], although different tasks and error metrics were used.…”

Section: Resultscontrasting

confidence: 80%

Advances in Arabic broadcast news transcription at RWTH

Rybach

Hahn

Gollan

et al. 2007

2007 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding (ASRU)

View full text Add to dashboard Cite

This paper describes the RWTH speech recognition system for Arabic. Several design aspects of the system, including cross-adaptation, multiple system design and combination, are analyzed. We summarize the semi-automatic lexicon generation for Arabic using a statistical approach to grapheme-tophoneme conversion and pronunciation statistics. Furthermore, a novel ASR-based audio segmentation algorithm is presented. Finally, we discuss practical approaches for parallelized acoustic training and memory efficient lattice rescoring. Systematic results are reported on recent GALE evaluation corpora.

show abstract

“…However, the system combination yields slightly better results. These results differ from those in [21], although different tasks and error metrics were used.…”

Section: Resultscontrasting

confidence: 80%

Advances in Arabic broadcast news transcription at RWTH

Rybach

Hahn

Gollan

et al. 2007

2007 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding (ASRU)

View full text Add to dashboard Cite

show abstract

“…When component systems use different word segmentation schemes, a direct combination between their outputs is problematic, for example, in Chinese where different character to word segmentations are used. Hence, for the Mandarin speech recognition tasks considered here, the most successful approach is to perform a character level combination, [3][4][5]11 as is also considered in this paper. This requires the mapping of outputs from a standard word based system to sub-word, character level.…”

Section: Hypothesis Level System Combinationmentioning

confidence: 99%

“…2 Therefore, the character error rate (CER) is the commonly used evaluation metric for state-of-the-art Mandarin speech recognition systems. [3][4][5] All languages have constrained syllable constructions and syllable sequence rules which enhance intelligibility. 6 These phonological and pragmatic constraints can be exploited for Chinese speech recognition.…”

Section: Introductionmentioning

confidence: 99%

Syllable language models for Mandarin speech recognition: Exploiting character language models

Liu

Hieronymus

Gales

et al. 2013

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

Mandarin Chinese is based on characters which are syllabic in nature and morphological in meaning. All spoken languages have syllabiotactic rules which govern the construction of syllables and their allowed sequences. These constraints are not as restrictive as those learned from word sequences, but they can provide additional useful linguistic information. Hence, it is possible to improve speech recognition performance by appropriately combining these two types of constraints. For the Chinese language considered in this paper, character level language models (LMs) can be used as a first level approximation to allowed syllable sequences. To test this idea, word and character level n-gram LMs were trained on 2.8 billion words (equivalent to 4.3 billion characters) of texts from a wide collection of text sources. Both hypothesis and model based combination techniques were investigated to combine word and character level LMs. Significant character error rate reductions up to 7.3% relative were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using an adapted history dependent multi-level LM that performs a log-linearly combination of character and word level LMs. This supports the hypothesis that character or syllable sequence models are useful for improving Mandarin speech recognition performance.

show abstract

“…Concerning the evaluation of text recognition, we rely on standard metrics of speech recognition such as Character Error Rate (CER) and Word Error Rate (WER) for determining the recognition accuracy of the segmentation free OCR output [91]. Note that we measure the quality of the generated pseudo ground-truth only by counting the number of validated text lines because, in our experiment, these data are automatically generated (no manual ground-truth available).…”

Section: Text Recognitionmentioning

confidence: 99%

Digital Comics Image Indexing Based on Deep Learning

2018

View full text Add to dashboard Cite

Abstract:The digital comic book market is growing every year now, mixing digitized and digital-born comics. Digitized comics suffer from a limited automatic content understanding which restricts online content search and reading applications. This study shows how to combine state-of-the-art image analysis methods to encode and index images into an XML-like text file. Content description file can then be used to automatically split comic book images into sub-images corresponding to panels easily indexable with relevant information about their respective content. This allows advanced search in keywords said by specific comic characters, action and scene retrieval using natural language processing. We get down to panel, balloon, text, comic character and face detection using traditional approaches and breakthrough deep learning models, and also text recognition using LSTM model. Evaluations on a dataset composed of online library content are presented, and a new public dataset is also proposed.

show abstract

Speech Recognition System Combination for Machine Translation

Cited by 9 publications

References 11 publications

Advances in Arabic broadcast news transcription at RWTH

Advances in Arabic broadcast news transcription at RWTH

Syllable language models for Mandarin speech recognition: Exploiting character language models

Digital Comics Image Indexing Based on Deep Learning

Contact Info

Product

Resources

About