Computer vision technology has made significant strides recently thanks to deep learning and increased computational power. One of the most critical applications of this technology is Human Action Recognition (HAR), which finds use in developing intelligent video surveillance systems and security measures. This paper presents a deep learning-based HAR model that combines a 3D Convolutional neural network (3DCNN), an LSTM multiplicative recurrent network, and Yolov6 for real-time object detection. We have extensively trained our model using various datasets, such as NTU-RGB-D, KITTI, NTU-RGB-D 120, UCF 101, and Fused datasets. Our proposed model outperforms other state-of-the-art (SOTA) methods, including traditional CNN, Yolov6, and CNN with BiLSTM, with impressive accuracies of 98.23%, 97.65%, 98.76%, 95.45%, and 97.65% on different datasets. This paper offers a breakthrough in the field of HAR that has the potential to revolutionize video surveillance systems and security measures.