“…However, these models require additional computation time in the form of motion vectors which makes them computationally inefficient due to data alignment problems. Moreover, few also tried 4 streams by adding motion information from depth sequences producing better recognition accuracies than the previous 2 stream model [7]. Similar to the above models, properties of the RGB and depth modalities have produced efficient action recognition algorithms such as depth rank pooling with CNNs [21], scene flow based RGB D channels on CNN [22] and sequence based methods with RNNs [23].…”