Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1461
|View full text |Cite
|
Sign up to set email alerts
|

Revisiting Character-Based Neural Machine Translation with Capacity and Compression

Abstract: Translating characters instead of words or word-fragments has the potential to simplify the processing pipeline for neural machine translation (NMT), and improve results by eliminating hyper-parameters and manual feature engineering. However, it results in longer sequences in which each symbol contains less information, creating both modeling and computational challenges. In this paper, we show that the modeling problem can be solved by standard sequence-to-sequence architectures of sufficient depth, and that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
95
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 85 publications
(102 citation statements)
references
References 19 publications
4
95
0
Order By: Relevance
“…Moreover, they are typically deployed as a pre-processing step before training the NMT model, hence, the predicted set of subword units are essentially not optimized for the translation task. Recently, (Cherry et al, 2018) extended the approach of NMT based on subword units to implement the translation model directly at the level of characters, which could reach comparable performance to the subword-based model, although this would require much larger networks which may be more difficult to train. The major reason to this requirement may lie behind the fact that treating the characters as individual tokens at the same level and processing the input sequences in linear time increases the difficulty of the learning task, where translation would then be modeled as a mapping between the characters in two languages.…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, they are typically deployed as a pre-processing step before training the NMT model, hence, the predicted set of subword units are essentially not optimized for the translation task. Recently, (Cherry et al, 2018) extended the approach of NMT based on subword units to implement the translation model directly at the level of characters, which could reach comparable performance to the subword-based model, although this would require much larger networks which may be more difficult to train. The major reason to this requirement may lie behind the fact that treating the characters as individual tokens at the same level and processing the input sequences in linear time increases the difficulty of the learning task, where translation would then be modeled as a mapping between the characters in two languages.…”
Section: Introductionmentioning
confidence: 99%
“…As discussed above, we believe its performance is due to its ability to cope with morphologies. Networks using character CNNs have proven a robust alternative when dealing with morphologies in different tasks (Cao and Rei, 2016;Cherry et al, 2018), and we believe this ability is of particular importance for the task at hand. Our analysis showed that networks considering morphologies were better able to categorise misinformation.…”
Section: Discussionmentioning
confidence: 99%
“…This result is particularly remarkable when considering that networks (1) and (2) have around 60x more parameters than the character CNN. Cao and Rei (2016); Cherry et al (2018) suggest that character CNNs are better equipped than the other networks to cope with morphologies. We put forward this characteristic would be useful when dealing with the particularities of misinformation in online social media.…”
Section: Methodsmentioning
confidence: 99%
“…The way translation input is represented has been shown to impact performance as well as how much data the model requires to train (Sennrich et al, 2016;Salesky et al, 2018;Cherry et al, 2018). The current standard approach for textbased translation is to segment words into subword units as a preprocessing step (Sennrich et al, 2016).…”
Section: Introductionmentioning
confidence: 99%