Abstract-In order to solve the problem on making full use of RGB+D dataset that includes RGB data, 3D skeletal data, depth map sequences and infrared videos, this paper proposes an action recognition method of RGB+D videos that merges a multilayer recurrent neural network and two-stream convolutional networks, combining RGB information and joints information together. Simulation results show that the multi-layer recurrent network proposed in this paper has better performance than other recurrent networks when dealing with the skeletal data. Moreover, by combining it with the spatial network or temporal network through nonlinear weighted score fusion, the recognition accuracy is further improved. The cross-view action recognition accuracy is improved to be 0.79%, 5.6%, 20.62% and 23.65% higher than the original method, respectively by using the multilayer network alone, combining the multi-layer network and spatial network, combining the multi-layer network and temporal network, and combining three networks together.