Learning to Remember Translation History with a Continuous Cache

Tu, Zhaopeng; Liu, Yang; Shi, Shuming; Zhang, Tong

doi:10.1162/tacl_a_00029

Cited by 147 publications

(154 citation statements)

References 17 publications

Supporting

Mentioning

150

Contrasting

Order By: Relevance

“…As mentioned previously, the Multi-Head Context Attention sub-layer is part of the Context Layer (Figure 2), the output of which is fed into the Transformer architecture through context gating (Tu et al, 2018). For i th word in source or target:…”

Section: Context Gatingmentioning

confidence: 99%

“…We again add the Document-level Context Layer alongside the decoder stack as in Figure 3. However, instead of choosing the keys and values to be monolingual as in the encoder, we follow Tu et al (2018) in choosing the key to match to the sourceside context, while designing the value to match to the target-side context. Hence, the keys (in the Decoder Context Encoding block) are composed of context vectors from the Source Attention sublayer, while the values are composed of the hidden representations of the target words, both from the last decoder layer.…”

Section: Bilingual Context Integration In Decodermentioning

confidence: 99%

See 1 more Smart Citation

Selective Attention for Context-aware Neural Machine Translation

Maruf¹,

Martins

Haffari³

2019

Proceedings of the 2019 Conference of the North

134

170

View full text Add to dashboard Cite

Despite the progress made in sentence-level NMT, current systems still fall short at achieving fluent, good quality translation for a full document. Recent works in context-aware NMT consider only a few previous sentences as context and may not scale to entire documents. To this end, we propose a novel and scalable top-down approach to hierarchical attention for context-aware NMT which uses sparse attention to selectively focus on relevant sentences in the document context and then attends to key words in those sentences. We also propose single-level attention approaches based on sentence or word-level information in the context. The document-level context representation, produced from these attention modules, is integrated into the encoder or decoder of the Transformer model depending on whether we use monolingual or bilingual context. Our experiments and evaluation on English-German datasets in different document MT settings show that our selective attention approach not only significantly outperforms context-agnostic baselines but also surpasses context-aware baselines in most cases.

show abstract

Section: Context Gatingmentioning

confidence: 99%

Section: Bilingual Context Integration In Decodermentioning

confidence: 99%

Selective Attention for Context-aware Neural Machine Translation

Maruf¹,

Martins

Haffari³

2019

Proceedings of the 2019 Conference of the North

134

170

View full text Add to dashboard Cite

show abstract

“…Augmented Dynamic Memory Despite positive results obtained so far, a particular problem with the neural network approach is that it has a tendency towards favoring to frequent observations but overlooking special cases that are not frequently observed. This weakness with regard to infrequent cases has been noticed by a number of researchers who propose an augmented dynamic memory for multiple applications, such as language models (Daniluk et al, 2017;Grave et al, 2016), question answering (Miller et al, 2016), and machine translation (Feng et al, 2017;Tu et al, 2017). We find that current sentence simplification models suffer from a similar neglect of infrequent simplification rules, which inspires us to explore augmented dynamic memory.…”

Section: Related Workmentioning

confidence: 99%

Integrating Transformer and Paraphrase Rules for Sentence Simplification

Zhao¹,

Meng²,

He³

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Sentence simplification aims to reduce the complexity of a sentence while retaining its original meaning. Current models for sentence simplification adopted ideas from machine translation studies and implicitly learned simplification mapping rules from normalsimple sentence pairs. In this paper, we explore a novel model based on a multi-layer and multi-head attention architecture and we propose two innovative approaches to integrate the Simple PPDB (A Paraphrase Database for Simplification), an external paraphrase knowledge base for simplification that covers a wide range of real-world simplification rules. The experiments show that the integration provides two major benefits: (1) the integrated model outperforms multiple stateof-the-art baseline models for sentence simplification in the literature (2) through analysis of the rule utilization, the model seeks to select more accurate simplification rules. The code and models used in the paper are available at https://github.com/ Sanqiang/text_simplification.

show abstract

“…The attributes of a product can be seen as structured knowledge data in our task. As key-value memory network (KVMN) is shown effective in structured data utilization [10,14,32], in our work we employ KVMN to store product attributes for generating answers. Correspondingly, we store the word embedding of each attribute's key and value in the KVMN.…”

Section: Attributes Encodermentioning

confidence: 99%

Product-Aware Answer Generation in E-Commerce Question-Answering

Gao

Zhang

Zhao

et al. 2019

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

In e-commerce portals, generating answers for product-related questions has become a crucial task. In this paper, we propose the task of product-aware answer generation, which tends to generate an accurate and complete answer from large-scale unlabeled e-commerce reviews and product attributes. Unlike existing question-answering problems, answer generation in e-commerce confronts three main challenges: (1) Reviews are informal and noisy; (2) joint modeling of reviews and key-value product attributes is challenging; (3) traditional methods easily generate meaningless answers. To tackle above challenges, we propose an adversarial learning based model, named PAAG, which is composed of three components: a questionaware review representation module, a key-value memory network encoding attributes, and a recurrent neural network as a sequence generator. Specifically, we employ a convolutional discriminator to distinguish whether our generated answer matches the facts. To extract the salience part of reviews, an attention-based review reader is proposed to capture the most relevant words given the question.Conducted on a large-scale real-world e-commerce dataset, our extensive experiments verify the effectiveness of each module in our proposed model. Moreover, our experiments show that our model achieves the state-of-the-art performance in terms of both automatic metrics and human evaluations.

show abstract

Learning to Remember Translation History with a Continuous Cache

Cited by 147 publications

References 17 publications

Selective Attention for Context-aware Neural Machine Translation

Selective Attention for Context-aware Neural Machine Translation

Integrating Transformer and Paraphrase Rules for Sentence Simplification

Product-Aware Answer Generation in E-Commerce Question-Answering

Contact Info

Product

Resources

About