Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models. Previous work shows that the quality of the inputs of the decoder is important and largely impacts the model accuracy. In this paper, we propose two methods to enhance the decoder inputs so as to improve NAT models. The first one directly leverages a phrase table generated by conventional SMT approaches to translate source tokens to target tokens, which are then fed into the decoder as inputs. The second one transforms source-side word embeddings to target-side word embeddings through sentence-level alignment and word-level adversary learning, and then feeds the transformed word embeddings into the decoder as inputs. Experimental results show our method largely outperforms the NAT baseline (Gu et al. 2017) by 5.11 BLEU scores on WMT14 English-German task and 4.72 BLEU scores on WMT16 English-Romanian task.
kNN-MT, recently proposed by Khandelwal et al. (2020a), successfully combines pretrained neural machine translation (NMT) model with token-level k-nearest-neighbor (kNN) retrieval to improve the translation accuracy. However, the traditional kNN algorithm used in kNN-MT simply retrieves a same number of nearest neighbors for each target token, which may cause prediction errors when the retrieved neighbors include noises. In this paper, we propose Adaptive kNN-MT to dynamically determine the number of k for each target token. We achieve this by introducing a light-weight Meta-k Network, which can be efficiently trained with only a few training samples. On four benchmark machine translation datasets, we demonstrate that the proposed method is able to effectively filter out the noises in retrieval results and significantly outperforms the vanilla kNN-MT model. Even more noteworthy is that the Meta-k Network learned on one domain could be directly applied to other domains and obtain consistent improvements, illustrating the generality of our method. Our implementation is open-sourced at https://github. com/zhengxxn/adaptive-knn-mt.
Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than 1 BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than 10 times) the inference process over AT baselines.
The masked language model has received remarkable attention due to its effectiveness on various natural language processing tasks. However, few works have adopted this technique in the sequence-to-sequence models. In this work, we introduce a jointly masked sequence-to-sequence model and explore its application on non-autoregressive neural machine translation (NAT). Specifically, we first empirically study the functionalities of the encoder and the decoder in NAT models, and find that the encoder takes a more important role than the decoder regarding the translation quality. Therefore, we propose to train the encoder more rigorously by masking the encoder input while training. As for the decoder, we propose to train it based on the consecutive masking of the decoder input with an ngram loss function to alleviate the problem of translating duplicate words. The two types of masks are applied to the model jointly at the training stage. We conduct experiments on five benchmark machine translation tasks, and our model can achieve 27.69/32.24 BLEU scores on WMT14 English-German/German-English tasks with 5+ times speed up compared with an autoregressive model.
Recent advances in the field of network embedding have shown that low-dimensional network representation is playing a critical role in network analysis. Most existing network embedding methods encode the local proximity of a node, such as the first-and second-order proximities. While being efficient, these methods are short of leveraging the global structural information between nodes distant from each other. In addition, most existing methods learn embeddings on one single fixed network, and thus cannot be generalized to unseen nodes or networks without retraining. In this paper we present SPINE, a method that can jointly capture the local proximity and proximities at any distance, while being inductive to efficiently deal with unseen nodes or networks. Extensive experimental results on benchmark datasets demonstrate the superiority of the proposed framework over the state of the art. * Corresponding Author Network Diffusions Communities
To improve the Al/Steel bimetallic interface, Eu was firstly added to the Al/Steel bimetallic interface made by liquid-solid casting. The effects of Eu addition on the microstructure, mechanical capacities, and rupture behavior of the Al/Steel bimetallic interface was studied in detail. As the addition of 0.1 wt.% Eu, the morphology of eutectic Si changed from coarse plate-like to fine fibrous and granular in Al-Si alloys, and the average thickness of the intermetallic compounds layer decreased to a minimum value of 7.96 μm. In addition, there was a more sudden drop of Fe in steel side and the Si in Al side was observed to be more than the other conditions. The addition of Eu did not change the kinds of intermetallic compounds in the Al/steel reaction layer, which was composed of Al5Fe2, τ1-(Al, Si)5Fe3, Al13Fe4, τ5-Al7Fe2Si, and τ6-Al9Fe2Si2 phases. The addition of the element Eu did not change the preferential orientation of the Al5Fe2, τ1-(Al, Si)5Fe3, Al13Fe4, τ5-Al7Fe2Si, and τ6-Al9Fe2Si2 phases, but refined the grain size of each phase and decreased the polar density of Al5Fe2 phase. Eu was mainly enriched in the front of the ternary compound layer (τ6-Al9Fe2Si2) near the Al side and steel matrix. The Fe and Al element distribution area tended to narrow in the interface after the addition of 0.1 wt.% Eu, which is probably because that Eu inhibits the spread of Al atoms along the c-axis direction of the Al5Fe2 phase and the growth of Al13Fe4, τ5-Al7Fe2Si, and τ6-Al9Fe2Si2 phases. When the Eu content was 0.1 wt.%, the shear strength of the Al/Steel bimetal achieved a maximum of 31.21 MPa, which was 47% higher than the bimetal without Eu.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.