Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection

Qian, Chen; Chen, Mengzhe; Li, Bo; Wang, Wen

doi:10.1109/icassp40776.2020.9053159

Cited by 23 publications

(25 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• machine translation task where, using an input sequence without punctuation, an output sequence of punctuation marks (or text including punctuation marks) is predicted [5,6,7] • sequence tagging task in which for each input, a probability distribution across possible punctuation marks is predicted [1,2,3,4,8] • sequence classification task in which for each sequence, a probability distribution across possible punctuation marks for a fixed location within the sequence is predicted [9] state-of-the-art results add a classification head to a pretrained transformer [4,12] and fine-tune on the IWSLT11 train dataset using a sequence tagging task. We hypothesise the sequence tagging, rather than classification, is used because the tagging approach can predict w punctuation marks at once, where w is the window size used.…”

Section: Related Workmentioning

confidence: 99%

“…The closer the target latency is to 0 words, the more often inference has to be conducted, making the tagging approach similar to the classification one in efficiency. There is recent work on punctuation prediction with a focus on the real-time use case, but with a tagging rather than classification approach [13,2]. Nguyen et al [13] create a model for fast punctuation prediction, with a latency of 20 words [2].…”

Section: Related Workmentioning

confidence: 99%

“…There is recent work on punctuation prediction with a focus on the real-time use case, but with a tagging rather than classification approach [13,2]. Nguyen et al [13] create a model for fast punctuation prediction, with a latency of 20 words [2]. Chen et al introduce a controllable time-delay transformer [2] with a latency of 10 words and comparable performance.…”

Section: Related Workmentioning

confidence: 99%

“…When the aforementioned downstream tasks have a real-time requirement on their own, real-time punctuation prediction is required as well. The currently prevalent approach for punctuation prediction is a sequence tagging one, where for each word in a window, a punctuation class (tag) is predicted [1,2,3,4]. Although these models are extensively evaluated, little information exists on their performance when there are real-time constraints in the form of limited right-side context.…”

Section: Introductionmentioning

confidence: 99%

“…In this work, we introduce mask-combine decoding, which a) can be used to impose real-time constraints on current state-of-the-art models used for punctuation prediction utilising tagging approaches b) unifies several decoding ap-proaches in one framework, allowing decoding to be treated as a set of hyper-parameters c) introduces the combination of overlapping probability distributions which leads to an incremental improvement when compared to previous decoding strategies. We obtain results using the aforementioned tagging approach prevalent in previous work [1,2,3,4], but use our decoding strategy with parameters limiting right-side context to simulate a real-time use case. This makes it possible to compare previous techniques with a novel sequence classification approach for punctuation prediction, in which only one punctuation mark at a specific location with limited lookahead is predicted for each sequence.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Mask-combine Decoding and Classification Approach for Punctuation Prediction with real-time Inference Constraints

Christoph¹,

Klejch²,

Bell³

2021

Preprint

View full text Add to dashboard Cite

In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows. We show that significant improvements can be achieved by optimising these strategies after training a model, only leading to a potential increase in inference time, with no requirement for retraining. We further use our decoding strategy framework for the first comparison of tagging and classification approaches for punctuation prediction in a real-time setting. Our results show that a classification approach for punctuation prediction can be beneficial when little or no right-side context is available.

show abstract