Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1459
|View full text |Cite
|
Sign up to set email alerts
|

Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks

Abstract: In this paper, we propose an additionsubtraction twin-gated recurrent network (ATR) to simplify neural machine translation. The recurrent units of ATR are heavily simplified to have the smallest number of weight matrices among units of all existing gated RNNs.With the simple addition and subtraction operation, we introduce a twin-gated mechanism to build input and forget gates which are highly correlated. Despite this simplification, the essential non-linearities and capability of modeling long-distance depend… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
9
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 29 publications
1
9
0
Order By: Relevance
“…In this section, we explain the used datasets, model architectures, optimization details and evaluation metrics in our experiments. All implementations are based on the zero 2 toolkit (Zhang et al, 2018 Regarding audio preprocessing, we use the given audio segmentation (train/dev/test) for experiments. We extract 40-dimensional log-Mel filterbanks with 2 https://github.com/bzhangGo/zero a step size of 10ms and window size of 25ms as the acoustic features, followed by feature expansion via second-order derivatives and mean-variance normalization.…”
Section: Experimental Settingsmentioning
confidence: 99%
“…In this section, we explain the used datasets, model architectures, optimization details and evaluation metrics in our experiments. All implementations are based on the zero 2 toolkit (Zhang et al, 2018 Regarding audio preprocessing, we use the given audio segmentation (train/dev/test) for experiments. We extract 40-dimensional log-Mel filterbanks with 2 https://github.com/bzhangGo/zero a step size of 10ms and window size of 25ms as the acoustic features, followed by feature expansion via second-order derivatives and mean-variance normalization.…”
Section: Experimental Settingsmentioning
confidence: 99%
“…We also compare our approach with the Averaged Attention Network (AAN) decoder (Zhang et al, 2018a), LN-LSTM and the Additionsubtraction Twin-gated Recurrent (ATR) network (Zhang et al, 2018b) on the WMT 14 En-De task.…”
Section: Resultsmentioning
confidence: 99%
“…LSTM (Hochreiter and Schmidhuber, 1997) and GRU (Cho et al, 2014) are the most popular recur-rent models. To accelerate RNN models, Zhang et al (2018b) propose a heavily simplified ATR network to have the smallest number of weight matrices among units of all existing gated RNNs. Peter et al (2016) investigate exponentially decaying bag-of-words input features for feedforward NMT models.…”
Section: Related Workmentioning
confidence: 99%
“…Our goal is to design a more concise deep learning model on the basis of ensuring accuracy and make it easier to be deployed on IoT micro‐controllers with limited resources, which are so small that they are not suitable for storing other complex recurrent network models. Taking the standard RNN, LSTM, and GRU models as the benchmark, and combining the inspiration obtained from the addition‐subtraction twin‐gated recurrent (ATR) cell proposed in the literature [21], we propose a more concise and smaller gated recurrent cell for PQD detection.…”
Section: The Proposed Sgrn Methodsmentioning
confidence: 99%
“…In the design process, we get inspiration from the ATR structure proposed in [21], and add a mechanism similar to self-attention to the proposed recurrent cell. We will show the analysis of SGRN by decomposing the recurrent structure, which can be explained by expanding Equation (4).…”
Section: Analysis Of Sgrn Structurementioning
confidence: 99%