This paper introduces a novel captioning method, partial and synchronized captioning (PSC), as a tool for developing second language (L2) listening skills. Unlike conventional full captioning, which provides the full text and allows comprehension of the material merely by reading, PSC promotes listening to the speech by presenting a selected subset of words, where each word is synched to its corresponding speech signal. In this method, word-level synchronization is realized by an automatic speech recognition (ASR) system, dedicated to the desired corpora. This feature allows the learners to become familiar with the correspondences between words and their utterances. Partialization is done by automatically selecting words or phrases likely to hinder listening comprehension. In this work we presume that the incidence of infrequent or specific words and fast delivery of speech are major barriers to listening comprehension. The word selection criteria are thus based on three factors: speech rate, word frequency and specificity. The thresholds for these features are adjusted to the proficiency level of the learners. The selected words are presented to aid listening comprehension while the remaining words are masked in order to keep learners listening to the audio. PSC was evaluated against no-captioning and full-captioning conditions using TED videos. The results indicate that PSC leads to the same level of comprehension as the full-captioning method while presenting less than 30% of the transcript. Furthermore, compared with the other methods, PSC can serve as an effective medium for decreasing dependence on captions and preparing learners to listen without any assistance.
Automatic speech recognition (ASR) results contain not only ASR errors, but also disfluencies and colloquial expressions that must be corrected to create readable transcripts. We take the approach of statistical machine translation (SMT) to "translate" from ASR results into transcript-style text. We introduce two novel modeling techniques in this framework: a context-dependent translation model, which allows for usage of context to accurately model translation probabilities, and log-linear interpolation of conditional and joint probabilities, which allows for frequently observed translation patterns to be given higher priority. The system is implemented using weighted finite state transducers (WFST). On an evaluation using ASR results and manual transcripts of meetings of the Japanese Diet (national congress), the proposed methods showed a significant increase in accuracy over traditional modeling techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.