Higher levels of PM2.5 concentration are becoming the leading cause of hazy days in China. However, studies have shown that the variations of PM2.5 involve complicated physical and chemical processes, which make their accurate predictions challenging. Meanwhile, the forecast results from numerical models frequently deviate from observation values. The deep learning method is a good substitute for the prediction of mass time series data in the field of meteorology. In the present study, a framework for PM2.5 concentration prediction is presented based on a three-dimensional convolutional neural network (3DCNN) and long short term memory neural network (LSTM). Using preprocessing, correlation analysis, feature extraction, and transformation, spatiotemporal sequence data was generated. In the spatiotemporal feature extraction phase, 3DCNN was used to extract high-level spatial features, and LSTM was used to extract temporal features. In the prediction phase, full connect (FC) was used to combine spatial and temporal features. To examine the efficacy of the proposed model, the PM2.5 concentration data, meteorological observation data, and grid dataset collected at ten observation stations in the Beijing Meteorological Bureau (BMB) were used. After the performance evaluation was compared with several methods including this proposed model, support vector machine (SVM), and the existing PM2.5 forecast system in BMB, root mean square errors (RMSE) and mean absolute errors (MAE) were chosen as evaluation indicators. The experimental results showed that the proposed model performed the best, the minimum MAE value was 3.24μg/m3, and the minimum RMSE value was 13.56μg/m3 over the ten stations. In addition, the proposed model overcame the underestimation produced by the existing PM2.5 forecast system in BMB and demonstrated superior performance for different time lengths over a 24-hour period. The results also confirmed the effectiveness of the deep learning method in the prediction of PM2.5 concentration.