“…• machine translation task where, using an input sequence without punctuation, an output sequence of punctuation marks (or text including punctuation marks) is predicted [5,6,7] • sequence tagging task in which for each input, a probability distribution across possible punctuation marks is predicted [1,2,3,4,8] • sequence classification task in which for each sequence, a probability distribution across possible punctuation marks for a fixed location within the sequence is predicted [9] state-of-the-art results add a classification head to a pretrained transformer [4,12] and fine-tune on the IWSLT11 train dataset using a sequence tagging task. We hypothesise the sequence tagging, rather than classification, is used because the tagging approach can predict w punctuation marks at once, where w is the window size used.…”