Robust Prediction of Punctuation and Truecasing for Medical ASR

Sunkara, Monica; Ronanki, Srikanth; Dixit, Kalpit; Bodapati, Sravan; Kirchhoff, Katrin

doi:10.18653/v1/2020.nlpmc-1.8

Cited by 22 publications

(19 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We instead use a single prediction for each token, and we find that we can achieve superior performance using much smaller context windows than [1]. Finally, [17,18] apply transformers to punctuation prediction using lexical features and prosodic features which are aligned using pre-trained feature extractors and alignment networks. In contrast to [17,18], we use forced-alignment from ASR and learn acoustic features from scratch from spectrogram segments corresponding each text tokens.…”

Section: Related Workmentioning

confidence: 99%

Multimodal Punctuation Prediction with Contextual Dropout

Silva

Theobald

Apostoloff

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Automatic speech recognition (ASR) is widely used in consumer electronics. ASR greatly improves the utility and accessibility of technology, but usually the output is only word sequences without punctuation. This can result in ambiguity in inferring user-intent. We first present a transformerbased approach for punctuation prediction that achieves 8% improvement on the IWSLT 2012 TED Task, beating the previous state of the art [1]. We next describe our multimodal model that learns from both text and audio, which achieves 8% improvement over the text-only algorithm on an internal dataset for which we have both the audio and transcriptions. Finally, we present an approach to learning a model using contextual dropout that allows us to handle variable amounts of future context at test time.

show abstract

Section: Related Workmentioning

confidence: 99%

Multimodal Punctuation Prediction with Contextual Dropout

Silva

Theobald

Apostoloff

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…However, the performance of simple n-gram language models suffers when longrange lexical information is required to disambiguate between punctuation classes [10]. Joint modelling of truecasing and punctuation tasks is considered in [11,12] using deep learning models in a classification framework. Authors in [11] assume punctuation as an independent task and truecasing as conditionally dependent on punctuation given latent representation of the input.…”

Section: Related Workmentioning

confidence: 99%

“…Joint modelling of truecasing and punctuation tasks is considered in [11,12] using deep learning models in a classification framework. Authors in [11] assume punctuation as an independent task and truecasing as conditionally dependent on punctuation given latent representation of the input. However, it is treated as a multi-task problem in [12] where both truecasing and punctuation are independent given the input latent representation.…”

Section: Related Workmentioning

confidence: 99%

Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios

Pappagari¹,

Żelasko²,

Pęzik³

et al. 2021

Preprint

View full text Add to dashboard Cite

Capitalization and punctuation are important cues for comprehending written texts and conversational transcripts. Yet, many ASR systems do not produce punctuated and case-formatted speech transcripts. We propose to use a multitask system that can exploit the relations between casing and punctuation to improve their prediction performance. Whereas text data for predicting punctuation and truecasing is seemingly abundant, we argue that written text resources are inadequate as training data for conversational models. We quantify the mismatch between written and conversational text domains by comparing the joint distributions of punctuation and word cases, and by testing our model cross-domain. Further, we show that by training the model in the written text domain and then transfer learning to conversations, we can achieve reasonable performance with less data.

show abstract

“…Word-based truecasing has been the dominant approach for a long time since the introduction of the task by Lita et al (2003). Word-based models can be further categorized into generative models such as HMMs (Lita et al, 2003;Gravano et al, 2009;Beaufays and Strope, 2013;Nebhi et al, 2015) and discriminative models such as Maximum-Entropy Markov Models (Chelba and Acero, 2004), Conditional Random Fields (Wang et al, 2006), and most recently Transformer neural network models (Nguyen et al, 2019;Rei et al, 2020;Sunkara et al, 2020). Word-based models need to refine the class of mixed case words because there is a combinatorial number of possibilities of case mixing for a word (e.g., LaTeX).…”

Section: Related Workmentioning

confidence: 99%

Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network

Zhang¹,

Cheng²,

Kumar³

et al. 2021

Preprint

View full text Add to dashboard Cite

Truecasing is the task of restoring the correct case (uppercase or lowercase) of noisy text generated either by an automatic system for speech recognition or machine translation or by humans. It improves the performance of downstream NLP tasks such as named entity recognition and language modeling. We propose a fast, accurate and compact two-level hierarchical word-and-character-based recurrent neural network model, the first of its kind for this problem. Using sequence distillation, we also address the problem of truecasing while ignoring token positions in the sentence, i.e. in a position-invariant manner.

show abstract

Robust Prediction of Punctuation and Truecasing for Medical ASR

Cited by 22 publications

References 23 publications

Multimodal Punctuation Prediction with Contextual Dropout

Multimodal Punctuation Prediction with Contextual Dropout

Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios

Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network

Contact Info

Product

Resources

About