Shale gas plays an important role in supplementing energy demand and reducing carbon footprint. A precise and effective prediction of shale gas production is important for optimizing completion parameters. This paper established a gated recurrent unit and multilayer perceptron combined neural network (GRU-MLP model) to forecast multistage fractured horizontal shale gas well production. A nondominated sorting genetic algorithm II (NSGA II) was introduced into the model to enable its automatic architectural optimization. In addition, embedded discrete fracture models (EDFM) and a reservoir simulator were used to generate training datasets. Meanwhile, a sensitivity analysis was carried out to find the variable’s importance and support the history matching. The results illustrated that the GRU-MLP model can precisely and efficiently predict the productivity of multistage fractured horizontal shale gas in a rapid and effective manner. Additionally, the model fits better at peak values of shale gas production. The GRU-MLP hybrid model has a higher accuracy within an acceptable computational time range compared to recurrent neural networks (RNN), long short-term memory (LSTM), and GRU models. The mean absolute percentage error (MAPE) and root mean square percentage error (RMSPE) for shale gas production generated by GRU-MLP model were 3.90% and 3.93%, respectively, values 84.87% and 84.88% smaller than those of the GRU model. Consequently, compared with a purely data-driven method, the physics-constrained data-driven method behaved better. The main results of the study will hopefully contribute to the intelligent development of shale gas production prediction.