2018
DOI: 10.1162/tacl_a_00029
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Remember Translation History with a Continuous Cache

Abstract: Existing neural machine translation (NMT) models generally translate sentences in isolation, missing the opportunity to take advantage of document-level information. In this work, we propose to augment NMT models with a very light-weight cache-like memory network, which stores recent hidden representations as translation history. The probability distribution over generated words is updated online depending on the translation history retrieved from the memory, endowing NMT models with the capability to dynamica… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
150
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 147 publications
(154 citation statements)
references
References 17 publications
4
150
0
Order By: Relevance
“…As mentioned previously, the Multi-Head Context Attention sub-layer is part of the Context Layer (Figure 2), the output of which is fed into the Transformer architecture through context gating (Tu et al, 2018). For i th word in source or target:…”
Section: Context Gatingmentioning
confidence: 99%
See 1 more Smart Citation
“…As mentioned previously, the Multi-Head Context Attention sub-layer is part of the Context Layer (Figure 2), the output of which is fed into the Transformer architecture through context gating (Tu et al, 2018). For i th word in source or target:…”
Section: Context Gatingmentioning
confidence: 99%
“…We again add the Document-level Context Layer alongside the decoder stack as in Figure 3. However, instead of choosing the keys and values to be monolingual as in the encoder, we follow Tu et al (2018) in choosing the key to match to the sourceside context, while designing the value to match to the target-side context. Hence, the keys (in the Decoder Context Encoding block) are composed of context vectors from the Source Attention sublayer, while the values are composed of the hidden representations of the target words, both from the last decoder layer.…”
Section: Bilingual Context Integration In Decodermentioning
confidence: 99%
“…Augmented Dynamic Memory Despite positive results obtained so far, a particular problem with the neural network approach is that it has a tendency towards favoring to frequent observations but overlooking special cases that are not frequently observed. This weakness with regard to infrequent cases has been noticed by a number of researchers who propose an augmented dynamic memory for multiple applications, such as language models (Daniluk et al, 2017;Grave et al, 2016), question answering (Miller et al, 2016), and machine translation (Feng et al, 2017;Tu et al, 2017). We find that current sentence simplification models suffer from a similar neglect of infrequent simplification rules, which inspires us to explore augmented dynamic memory.…”
Section: Related Workmentioning
confidence: 99%
“…The attributes of a product can be seen as structured knowledge data in our task. As key-value memory network (KVMN) is shown effective in structured data utilization [10,14,32], in our work we employ KVMN to store product attributes for generating answers. Correspondingly, we store the word embedding of each attribute's key and value in the KVMN.…”
Section: Attributes Encodermentioning
confidence: 99%