Punctuation Prediction Model for Conversational Speech

Żelasko, Piotr; Szymański, Piotr; Mizgajski, Jan; Szymczak, Adrian; Carmiel, Yishay; Dehak, Najim

doi:10.21437/interspeech.2018-1096

Cited by 39 publications

(23 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In terms of machine learning models, conditional random field (CRF) has been widely used in earlier studies (Lu and Ng, 2010;Zhang et al, 2013). Lately, the use of deep learning models, such as Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and transformers have also been used (Che et al, 2016b;Gale and Parthasarathy, 2017;Zelasko et al, 2018;Wang et al, 2018) for this task.…”

Section: Introductionmentioning

confidence: 99%

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Alam¹,

Khan²,

Alam³

2020

Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-Nut 2020)

View full text Add to dashboard Cite

Punctuation restoration is a common postprocessing problem for Automatic Speech Recognition (ASR) systems. It is important to improve the readability of the transcribed text for the human reader and facilitate NLP tasks. Current state-of-art address this problem using different deep learning models. Recently, transformer models have proven their success in downstream NLP tasks, and these models have been explored very little for the punctuation restoration problem. In this work, we explore different transformer based models and propose an augmentation strategy for this task, focusing on high-resource (English) and low-resource (Bangla) languages. For English, we obtain comparable state-of-the-art results, while for Bangla, it is the first reported work, which can serve as a strong baseline for future work. We have made our developed Bangla dataset publicly available for the research community.

show abstract

Section: Introductionmentioning

confidence: 99%

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Alam¹,

Khan²,

Alam³

2020

Proceedings of the Sixth Workshop on Noisy User-Generated Text (W-Nut 2020)

View full text Add to dashboard Cite

show abstract

“…The ones with plain text outperform the ones with To explore the impact of min_words_cut value to the quality of the result, we performed the experiment on sequenceto-sequence LSTM model with the overlapping of 15 words and min_words_cut ranges from 0 to 15. The outcome shown in Figure 5 indicates that f1-scores peak in the middle range of chunk size (4)(5)(6)(7)(8)(9)(10). It demonstrate that predictions of uppercase and lowercase are stable and independent from min_words_cut.…”

Section: Evaluation On Plain-text Model and Encoded-text Modelmentioning

confidence: 82%

“…As we can see, with the help of Figure 5: F1-score on different min word cut. It peak in the middle range of overlap size (4)(5)(6)(7)(8)(9)(10). Predicting uppercase and lowercase are stable and independent from min word cut, question mark is quite sensitive with this hyper-parameter.…”

Section: Evaluation Metricmentioning

confidence: 99%

Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging

Nguyen

Nguyen²,

Nguyen

et al. 2019

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databa

View full text Add to dashboard Cite

In recent years, studies on automatic speech recognition (ASR) have shown outstanding results that reach human parity on short speech segments. However, there are still difficulties in standardizing the output of ASR such as capitalization and punctuation restoration for long-speech transcription. The problems obstruct readers to understand the ASR output semantically and also cause difficulties for natural language processing models such as NER, POS and semantic parsing. In this paper, we propose a method to restore the punctuation and capitalization for long-speech ASR transcription. The method is based on Transformer models and chunk merging that allows us to (1), build a single model that performs punctuation and capitalization in one go, and (2), perform decoding in parallel while improving the prediction accuracy. Experiments on British National Corpus showed that the proposed approach outperforms existing methods in both accuracy and decoding speed.

show abstract

“…Although it was originally developed for the localization of amino acid sequences in proteins, the irrelevance of the type of symbol to be located has allowed it to be used in multiple fields, among others, the location of similarities between sequences of words. The Needleman and Wunch algorithm is used today in different domains as a tool for a punctuation prediction model for conversational speech [33], as a support technique for large-scale computerized text analysis in political science [34] and for helping in automatic corpus creation for Wikipedia [35].…”

Section: Literature Reviewmentioning

confidence: 99%

“…Under those conditions, all programs should be delayed by the amount of time that the authors have proposed of 20s. In [33] a time of 15s was proposed but has been increased to 20s in order to cover most of the cases without delaying the broadcast excessively. This solution would allow standard tuners to be used, without the adaptation described in the following paragraph, but there is a certain reticence on the part of broadcasters to implement this type of audio-visual manipulation.…”

Section: ) Erase Time Calculationmentioning

confidence: 99%

Sub-Sync: Automatic Synchronization of Subtitles in the Broadcasting of True Live programs in Spanish

González‐Carrasco

Puente²,

Ruíz‐Mezcua

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Individuals with sensory impairment (hearing or visual) encounter serious communication barriers within society and the world around them. These barriers hinder the communication process and make access to information an obstacle they must overcome on a daily basis. In this context, one of the most common complaints made by the Television (TV) users with sensory impairment is the lack of synchronism between audio and subtitles in some types of programs. In addition, synchronization remains one of the most significant factors in audience perception of quality in live-originated TV subtitles for the deaf and hard of hearing. This paper introduces the Sub-Sync framework intended for use in automatic synchronization of audiovisual contents and subtitles, taking advantage of current well-known techniques used in symbol sequences alignment. In this particular case, these symbol sequences are the subtitles produced by the broadcaster subtitling system and the word flow generated by an automatic speech recognizing the procedure. The goal of Sub-Sync is to address the lack of synchronism that occurs in the subtitles when produced during the broadcast of live TV programs or other programs that have some improvised parts. Furthermore, it also aims to resolve the problematic interphase of synchronized and unsynchronized parts of mixed type programs. In addition, the framework is able to synchronize the subtitles even when they do not correspond literally to the original audio and/or the audio cannot be completely transcribed by an automatic process. Sub-Sync has been successfully tested in different live broadcasts, including mixed programs, in which the synchronized parts (recorded, scripted) are interspersed with desynchronized (improvised) ones. INDEX TERMS Accessibility, TV broadcasting, algorithm design and analysis, automatic speech recognition.

show abstract

Punctuation Prediction Model for Conversational Speech

Cited by 39 publications

References 12 publications

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging

Sub-Sync: Automatic Synchronization of Subtitles in the Broadcasting of True Live programs in Spanish

Contact Info

Product

Resources

About