Point, Disambiguate and Copy: Incorporating Bilingual Dictionaries for Neural Machine Translation

Zhang, Tong; Zhang, Long; Ye, Wei; Li, Bo; Sun, Jinan; Zhu, Xiaoyu; Zhao, Wen; Zhang, Shikun

doi:10.18653/v1/2021.acl-long.307

Cited by 10 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The initial embedding is based on BERT and the hyperparameter is the same as their paper. GatedGCN exploits gate diversity and the overall contextual importance scores of the words on graph convolution neural networks 4 . The dependency tree is built on Stanford CoreNLP toolkit.…”

Section: Appendix D Comparision Methods Detailsmentioning

confidence: 99%

See 1 more Smart Citation

DPNPED: Dynamic Perception Network for Polysemous Event Trigger Detection

Sun

2022

IEEE Access

View full text Add to dashboard Cite

Event detection is the process of analyzing event streams to detect the occurrences of events and categorize them. General methods for solving this problem are to identify and classify event triggers. Most previous works focused on improving the recognition and classification networks which neglected the representation of polysemous event triggers. Polysemy is habitually somewhat confusing in semantic understanding and hard to detect. To improve polysemous trigger detection, this paper proposes a novel framework called DPNPED, which dynamically adjusts the network depth between polysemous and common words. Firstly, to measure the polysemy, the difficulty factor is devised based on the frequency of a word as an event trigger. Secondly, the DPNPED utilizes a confidence measure to automatically adjust the network depth by comparing the predicted and initial probability distribution. Finally, our model applies focal loss to dynamically integrate the difficulty factor and confidence measure to enhance the learning of polysemous triggers. The experimental results show that our method achieves a noticeable improvement in polysemous event trigger detection.

show abstract

Section: Appendix D Comparision Methods Detailsmentioning

confidence: 99%

“…MOGANED code is available in https://github.com/ll0ruc/MOGANED3 DMBERT code is available in https://github.com/Bakser/DMBERT4 GatedGCN code is available in https://github.com/laiviet/ed-gated-gcn 5 EE-GCN code is available in https://github.com/cuishiyao96/eegcned6 MLBiNet code is available in https://github.com/zjunlp/DocED…”

mentioning

confidence: 99%

DPNPED: Dynamic Perception Network for Polysemous Event Trigger Detection

Sun

2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Low-Frequency Word Translation is a persisting challenge for NMT due to the token imbalance phenomenon. Conventional researches range from introducing finegrained translation units (Luong and Manning 2016;Lee, Cho, and Hofmann 2017), seeking optimal vocabulary (Wu et al 2016;Sennrich, Haddow, and Birch 2016;Gowda and May 2020;Liu et al 2021), to incorporating external lexical knowledge (Luong et al 2015;Arthur, Neubig, and Nakamura 2016;Zhang et al 2021). Recently, some approaches alleviate this problem by well-designed loss function with adaptive weights, in light of the token frequency (Gu et al 2020) or bilingual mutual information (Xu et al 2021b).…”

Section: Related Workmentioning

confidence: 99%

“…Implementation Details We examine our model based on the advanced Transformer architecture and base setting (Vaswani et al 2017). All the baseline systems and our models are implemented on top of THUMT toolkit (Zhang et al 2017). During training, the dropout rate and label smoothing are set to 0.1.…”

Section: Experimental Settingsmentioning

confidence: 99%

Frequency-Aware Contrastive Learning for Neural Machine Translation

Zhang

Yang

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

Low-frequency word prediction remains a challenge in modern neural machine translation (NMT) systems. Recent adaptive training methods promote the output of infrequent words by emphasizing their weights in the overall training objectives. Despite the improved recall of low-frequency words, their prediction precision is unexpectedly hindered by the adaptive objectives. Inspired by the observation that low-frequency words form a more compact embedding space, we tackle this challenge from a representation learning perspective. Specifically, we propose a frequency-aware token-level contrastive learning method, in which the hidden state of each decoding step is pushed away from the counterparts of other target words, in a soft contrastive way based on the corresponding word frequencies. We conduct experiments on widely used NIST Chinese-English and WMT14 English-German translation tasks. Empirical results show that our proposed methods can not only significantly improve the translation quality but also enhance lexical diversity and optimize word representation space. Further investigation reveals that, comparing with related adaptive training strategies, the superiority of our method on low-frequency word prediction lies in the robustness of token-level recall across different frequencies without sacrificing precision.

show abstract