The accurate estimation of future network traffic is a key enabler for early warning of network degradation and automated orchestration of network resources. The long short-term memory neural network (LSTM) is a popular architecture for network traffic forecasting, and has been successfully used in many applications. However, it has been observed that LSTMs suffer from limited memory capacity problems when the sequence is long. In this paper, we propose a gated dilated causal convolution based encoder-decoder (GDCC-ED) model for network traffic forecasting. The GDCC-ED learns a vector representation in the encoder from historical network traffic series, in which gated dilated causal convolutions are adopted to expand the long-range memory capacity. Moreover, different types of features in various perspectives, including temporal-independent and temporal-related features, are incorporated. In the decoder, the GDCC-ED exploits an RNN with LSTM units to map the vector representation back to a variable-length target sequence. Besides, a sequence data augmentation technique is designed to solve the problem of data scarcity. Experimental results demonstrate that our model achieves superior performance than state-of-theart algorithms by 11.6%. INDEX TERMS Network traffic forecasting, dilated causal convolution, gated activations, encoder-decoder.