Short-term passenger demand forecasting is of great importance to the ondemand ride service platform, which can incentivize vacant cars moving from over-supply regions to over-demand regions. The spatial dependences, temporal dependences, and exogenous dependences need to be considered simultaneously, however, which makes short-term passenger demand forecasting challenging. We propose a novel deep learning (DL) approach, named the fusion convolutional long short-term memory network (FCL-Net), to address these three dependences within one end-to-end learning architecture. The model is stacked and fused by multiple convolutional long short-term memory (LSTM) layers, standard LSTM layers, and convolutional layers. The fusion of convolutional techniques and the LSTM network enables the proposed DL approach to better capture the spatiotemporal characteristics and correlations of explanatory variables. A tailored spatially aggregated random forest is employed to rank the importance of the explanatory variables. The ranking is then used for feature selection. The proposed DL approach is applied to the short-term forecasting of passenger demand under an on-demand ride service platform in Hangzhou, China. Experimental results, validated on real-world data provided by DiDi Chuxing, show that the FCL-Net achieves better predictive performance than traditional approaches in- * cluding both classical time-series prediction models and neural network based algorithms (e.g., artificial neural network and LSTM). Furthermore, the consideration of exogenous variables in addition to passenger demand itself, such as the travel time rate, time-of-day, day-of-week, and weather conditions, is proven to be promising, since it reduces the root mean squared error (RMSE) by 50.9%. It is also interesting to find that the feature selection reduces 30% in the dimension of predictors and leads to only 0.6% loss in the forecasting accuracy measured by RMSE in the proposed model. This paper is one of the first DL studies to forecast the short-term passenger demand of an on-demand ride service platform by examining the spatio-temporal correlations.Keywords: On-demand ride services, short-term demand forecasting, deep learning (DL), fusion convolutional long short-term memory network (FCL-Net), long short-term memory (LSTM), convolutional neural network (CNN)