ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053159
|View full text |Cite
|
Sign up to set email alerts
|

Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection

Abstract: With the increased applications of automatic speech recognition (ASR) in recent years, it is essential to automatically insert punctuation marks and remove disfluencies in transcripts, to improve the readability of the transcripts as well as the performance of subsequent applications, such as machine translation, dialogue systems, and so forth. In this paper, we propose a Controllable Time-delay Transformer (CT-Transformer) model that jointly completes the punctuation prediction and disfluency detection tasks … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
23
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 23 publications
(25 citation statements)
references
References 20 publications
0
23
0
Order By: Relevance
“…• machine translation task where, using an input sequence without punctuation, an output sequence of punctuation marks (or text including punctuation marks) is predicted [5,6,7] • sequence tagging task in which for each input, a probability distribution across possible punctuation marks is predicted [1,2,3,4,8] • sequence classification task in which for each sequence, a probability distribution across possible punctuation marks for a fixed location within the sequence is predicted [9] state-of-the-art results add a classification head to a pretrained transformer [4,12] and fine-tune on the IWSLT11 train dataset using a sequence tagging task. We hypothesise the sequence tagging, rather than classification, is used because the tagging approach can predict w punctuation marks at once, where w is the window size used.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…• machine translation task where, using an input sequence without punctuation, an output sequence of punctuation marks (or text including punctuation marks) is predicted [5,6,7] • sequence tagging task in which for each input, a probability distribution across possible punctuation marks is predicted [1,2,3,4,8] • sequence classification task in which for each sequence, a probability distribution across possible punctuation marks for a fixed location within the sequence is predicted [9] state-of-the-art results add a classification head to a pretrained transformer [4,12] and fine-tune on the IWSLT11 train dataset using a sequence tagging task. We hypothesise the sequence tagging, rather than classification, is used because the tagging approach can predict w punctuation marks at once, where w is the window size used.…”
Section: Related Workmentioning
confidence: 99%
“…The closer the target latency is to 0 words, the more often inference has to be conducted, making the tagging approach similar to the classification one in efficiency. There is recent work on punctuation prediction with a focus on the real-time use case, but with a tagging rather than classification approach [13,2]. Nguyen et al [13] create a model for fast punctuation prediction, with a latency of 20 words [2].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations