Revisiting Character-Based Neural Machine Translation with Capacity and Compression

Cherry, Colin; Foster, George; Bapna, Ankur; Fırat, Orhan; Macherey, Wolfgang

doi:10.18653/v1/d18-1461

Cited by 85 publications

(102 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, they are typically deployed as a pre-processing step before training the NMT model, hence, the predicted set of subword units are essentially not optimized for the translation task. Recently, (Cherry et al, 2018) extended the approach of NMT based on subword units to implement the translation model directly at the level of characters, which could reach comparable performance to the subword-based model, although this would require much larger networks which may be more difficult to train. The major reason to this requirement may lie behind the fact that treating the characters as individual tokens at the same level and processing the input sequences in linear time increases the difficulty of the learning task, where translation would then be modeled as a mapping between the characters in two languages.…”

Section: Introductionmentioning

confidence: 99%

On the Importance of Word Boundaries in Character-level Neural Machine Translation

Ataman

Fırat

Gangi

et al. 2019

Proceedings of the 3rd Workshop on Neural Generation and Translation

Self Cite

View full text Add to dashboard Cite

Neural Machine Translation (NMT) models generally perform translation using a fixedsize lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality. The standard approach to overcome this limitation is to segment words into subword units, typically using some external tools with arbitrary heuristics, resulting in vocabulary units not optimized for the translation task. Recent studies have shown that the same approach can be extended to perform NMT directly at the level of characters, which can deliver translation accuracy on-par with subword-based models, on the other hand, this requires relatively deeper networks. In this paper, we propose a more computationally-efficient solution for character-level NMT which implements a hierarchical decoding architecture where translations are subsequently generated at the level of words and characters. We evaluate different methods for open-vocabulary NMT in the machine translation task from English into five languages with distinct morphological typology, and show that the hierarchical decoding model can reach higher translation accuracy than the subword-level NMT model using significantly fewer parameters, while demonstrating better capacity in learning longer-distance contextual and grammatical dependencies than the standard character-level NMT model.

show abstract

Section: Introductionmentioning

confidence: 99%

On the Importance of Word Boundaries in Character-level Neural Machine Translation

Ataman

Fırat

Gangi

et al. 2019

Proceedings of the 3rd Workshop on Neural Generation and Translation

Self Cite

View full text Add to dashboard Cite

show abstract

“…As discussed above, we believe its performance is due to its ability to cope with morphologies. Networks using character CNNs have proven a robust alternative when dealing with morphologies in different tasks (Cao and Rei, 2016;Cherry et al, 2018), and we believe this ability is of particular importance for the task at hand. Our analysis showed that networks considering morphologies were better able to categorise misinformation.…”

Section: Discussionmentioning

confidence: 99%

“…This result is particularly remarkable when considering that networks (1) and (2) have around 60x more parameters than the character CNN. Cao and Rei (2016); Cherry et al (2018) suggest that character CNNs are better equipped than the other networks to cope with morphologies. We put forward this characteristic would be useful when dealing with the particularities of misinformation in online social media.…”

Section: Methodsmentioning

confidence: 99%

Towards easy-to-implement misinformation automatic detection for online social media

López¹,

Molina-Solana²,

Gómez‐Romero³

2019

Proceedings of the Conference for Truth and Trust Online 2019

View full text Add to dashboard Cite

The introduction of social media technologies has fostered the diffusion of misinformation. The speed and format with which misinformation is created and the pace at which it diffuses poses new challenges for practitioners and policy-makers alike. In this paper, we show that a simple network using characterlevel inputs given to a CNN is well suited to detect misinformation. We believe the effectiveness of such a simple architecture is because the CNN can exploit morphological differences between misinformation and other types of information. To test our intuition, we compare the character CNN inputs to different others like word embeddings. We find that the network using character CNN outperforms models that do not take into consideration morphologies and matches the performance of others that consider it to some extent. We argue that the nature of misinformation, the low availability of training data and the multidisciplinary background of individuals implementing misinformation detection algorithms make simple, easy-to-implement networks such as the character CNN good alternatives to other models.

show abstract

“…The way translation input is represented has been shown to impact performance as well as how much data the model requires to train (Sennrich et al, 2016;Salesky et al, 2018;Cherry et al, 2018). The current standard approach for textbased translation is to segment words into subword units as a preprocessing step (Sennrich et al, 2016).…”

Section: Introductionmentioning

confidence: 99%

Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

Salesky¹,

Sperber²,

Black³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Previous work on end-to-end translation from speech has primarily used frame-level features as speech representations, which creates longer, sparser sequences than text. We show that a naïve method to create compressed phoneme-like speech representations is far more effective and efficient for translation than traditional frame-level speech features. Specifically, we generate phoneme labels for speech frames and average consecutive frames with the same label to create shorter, higher-level source sequences for translation. We see improvements of up to 5 BLEU on both our high and low resource language pairs, with a reduction in training time of 60%. Our improvements hold across multiple data sizes and two language pairs. MethodWhile frame-level Mel-frequency cepstral coefficient (MFCC) and filterbank features are informative, they create long, repetitive sequences which take recurrent models many examples to learn to model. Higher-level representations like phonemes can create shorter, better-represented input sequences to improve training efficiency and

show abstract

Revisiting Character-Based Neural Machine Translation with Capacity and Compression

Cited by 85 publications

References 19 publications

On the Importance of Word Boundaries in Character-level Neural Machine Translation

On the Importance of Word Boundaries in Character-level Neural Machine Translation

Towards easy-to-implement misinformation automatic detection for online social media

Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

Contact Info

Product

Resources

About