The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413492
|View full text |Cite
|
Sign up to set email alerts
|

Sentence Boundary Augmentation for Neural Machine Translation Robustness

Abstract: Neural Machine Translation (NMT) models have demonstrated strong state of the art performance on translation tasks where well-formed training and evaluation data are provided, but they remain sensitive to inputs that include errors of various types. Specifically, in the context of long-form speech translation systems, where the input transcripts come from Automatic Speech Recognition (ASR), the NMT models have to handle errors including phoneme substitutions, grammatical structure, and sentence boundaries, all… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 10 publications
1
6
0
Order By: Relevance
“…The ASR WER on the test sentences is 9.0%. proach in (Li et al, 2021). According to Table 5, our results yielded a BLEU score of 27.1, which is similar to the score of 27.0 reported in Table 4 of that paper, which represents their best result from training with synthetic segment breaks.…”
Section: Iwslt Resultssupporting
confidence: 83%
See 3 more Smart Citations
“…The ASR WER on the test sentences is 9.0%. proach in (Li et al, 2021). According to Table 5, our results yielded a BLEU score of 27.1, which is similar to the score of 27.0 reported in Table 4 of that paper, which represents their best result from training with synthetic segment breaks.…”
Section: Iwslt Resultssupporting
confidence: 83%
“…Finally, we train on (projected-human-source, projected-goldtranslation) pairs. This is similar to how artificial target sentences were constructed by Li et al (2021), but in our case, the boundaries are determined by automatic punctuation on ASR output, rather than from introducing boundary errors at random.…”
Section: Gold Dementioning
confidence: 72%
See 2 more Smart Citations
“…Peng et al (2020) Propose dictionary-based DA (DDA) for cross-domain NMT by synthesizing a domain-specific dictionary and automatically generating a pseudo in-domain parallel corpus. Li et al (2020a) Present a DA method using sentence boundary segmentation to improve the robustness of NMT on ASR transcripts. Nishimura et al (2018) Introduce DA methods for multi-source NMT that fills in incomplete portions of multi-source training data.…”
Section: Appendices a Useful Blog Posts And Code Repositoriesmentioning
confidence: 99%