Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models

Luong, Minh-Thang; Manning, Christopher D.

doi:10.18653/v1/p16-1100

Cited by 266 publications

(256 citation statements)

References 20 publications

Supporting

Mentioning

249

Contrasting

Unclassified

Order By: Relevance

“…Chung et al [4] focus on handling translation at the level of characters without any word segmentation only on target side. Luong et al [13] propose a novel hybrid architecture that combines the strength of both word and character-based models. Sennrich et al [20] use BPE method to encode rare and unknown words as sequences of subword units.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Word, Subword or Character? An Empirical Study of Granularity in Chinese-English NMT

Wang

Zhang

et al. 2017

Communications in Computer and Information Science

View full text Add to dashboard Cite

Abstract. Neural machine translation (NMT), a new approach to machine translation, has been proved to outperform conventional statistical machine translation (SMT) across a variety of language pairs. Translation is an open-vocabulary problem, but most existing NMT systems operate with a fixed vocabulary, which causes the incapability of translating rare words. This problem can be alleviated by using different translation granularities, such as character, subword and hybrid word-character. Translation involving Chinese is one of the most difficult tasks in machine translation, however, to the best of our knowledge, there has not been any other work exploring which translation granularity is most suitable for Chinese in NMT. In this paper, we conduct an extensive comparison using Chinese-English NMT as a case study. Furthermore, we discuss the advantages and disadvantages of various translation granularities in detail. Our experiments show that subword model performs best for Chinese-to-English translation with the vocabulary which is not so big while hybrid word-character model is most suitable for Englishto-Chinese translation. Moreover, experiments of different granularities show that Hybrid BPE method can achieve best result on Chinese-toEnglish translation task.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Most of these are below the word level, e.g. characters [4], hybrid word-characters [13,25], and more intelligent subwords [20,25]. Besides, pioneering studies [25,8] demonstrate that translation tasks involving Chinese are some of the most difficult problems in NMT systems.…”

Section: Introductionmentioning

confidence: 99%

Word, Subword or Character? An Empirical Study of Granularity in Chinese-English NMT

Wang

Zhang

et al. 2017

Communications in Computer and Information Science

View full text Add to dashboard Cite

show abstract

“…Let us give just a few examples of usage: text classification [16], part-of-speech tagging [17,18], language modeling [19], sentiment analysis [20] or text normalization [21]. Recently, the concept of using subwords to form a representation appeared [22,23]. Another work [24] suggests to guide word-embeddings with morphologically annotated data and shows achievement using German in a case study.…”

Section: Introductionmentioning

confidence: 99%

Towards Learning Word Representation

Wiercioch¹

2017

View full text Add to dashboard Cite

Abstract. Continuous vector representations, as a distributed representations for words have gained a lot of attention in Natural Language Processing (NLP) field. Although they are considered as valuable methods to model both semantic and syntactic features, they still may be improved. For instance, the open issue seems to be to develop different strategies to introduce the knowledge about the morphology of words. It is a core point in case of either dense languages where many rare words appear and texts which have numerous metaphors or similies. In this paper, we extend a recent approach to represent word information. The underlying idea of our technique is to present a word in form of a bag of syllable and letter n-grams. More specifically, we provide a vector representation for each extracted syllable-based and letter-based n-gram, and perform concatenation. Moreover, in contrast to the previous method, we accept n-grams of varied length n. Further various experiments, like tasks-word similarity ranking or sentiment analysis report our method is competitive with respect to other state-of-theart techniques and takes a step toward more informative word representation construction.

show abstract

Section: Introductionmentioning

confidence: 99%

“…This type of model needs no tokenization, freeing the system from one source of errors. Character-level neural models have been applied in several NLP tasks, ranging from relatively basic tasks such as text categorization and language modeling to complex prediction tasks such as translation (Luong and Manning, 2016;Sennrich et al, 2016).In particular, character-based neural models are attractive because they can take sub-word units, such as the morphology, into account. Morphological analysis and prediction models using character-based recurrent neural networks have recently become popular, as evidenced by their complete dominance at the SIGMORPHON shared task on morphological reinflection .…”

mentioning

confidence: 99%

Proceedings of the First Workshop on Subword and Character Level Models in NLP

2017

View full text Add to dashboard Cite

IntroductionTraditional NLP starts with a hand-engineered layer of representation, the level of tokens or words. A tokenization component first breaks up the text into units using manually designed rules. Tokens are then processed by components such as word segmentation, morphological analysis and multiword recognition. The heterogeneity of these components makes it hard to create integrated models of both structure within tokens (e.g., morphology) and structure across multiple tokens (e.g., multi-word expressions). This approach can perform poorly (i) for morphologically rich languages, (ii) for noisy text, (iii) for languages in which the recognition of words is difficult and (iv) for adaptation to new domains; and (v) it can impede the optimization of preprocessing in end-to-end learning.The workshop provides a forum for discussing recent advances as well as future directions on sub-word and character-level natural language processing and representation learning that address these problems.We received 37 submissions, out of which we accepted 24 as papers and 4 as extended abstracts. AbstractMost of neural language models use different kinds of embeddings for word prediction. While word embeddings can be associated to each word in the vocabulary or derived from characters as well as factored morphological decomposition, these word representations are mainly used to parametrize the input, i.e. the context of prediction. This work investigates the effect of using subword units (character and factored morphological decomposition) to build output representations for neural language modeling. We present a case study on Czech, a morphologically-rich language, experimenting with different input and output representations. When working with the full training vocabulary, despite unstable training, our experiments show that augmenting the output word representations with character-based embeddings can significantly improve the performance of the model. Moreover, reducing the size of the output look-up table, to let the character-based embeddings represent rare words, brings further improvement. IntroductionMost of neural language models, such as n-gram models (Bengio et al., 2003) are word based and rely on the definition of a finite vocabulary V. Therefore, a look-up table maps each wordw ∈ V to a vector of real features, and is stored in a matrix. While this approach yields significant improvement for a variety of tasks and languages, see for instance (Schwenk, 2007) in speech recognition and (Le et al., 2012; Devlin et al., 2014; in machine translation, it induces several limitations.For morphologically-rich languages, like Czech or German, the lexical coverage is still an important issue, since there is a combinatorial explosion of word forms, most of which are hardly observed on training data. On the one hand, growing the look-up table is not a solution, since it would increase the number of parameters without having enough training examples for a proper estimation. On the other hand, rare words can be replaced...

show abstract

Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models

Cited by 266 publications

References 20 publications

Word, Subword or Character? An Empirical Study of Granularity in Chinese-English NMT

Word, Subword or Character? An Empirical Study of Granularity in Chinese-English NMT

Towards Learning Word Representation

Proceedings of the First Workshop on Subword and Character Level Models in NLP

Contact Info

Product

Resources

About