Recognizing human actions has numerous practical applications that can address issues and enhance the effectiveness and living standards across various domains. For solving the human action recognition task from videos, the Deep Conv-LSTM model architecture has been used on the UCF101 dataset. To preprocess the frames, a method has been proposed that integrates two algorithms: Noise Cleaning and Dissimilarity-Based Key Frame Selec¬t¬i¬o¬n (KFS). By employing these algorithms cohesively and effectively, image quality has been enhanced and unwanted data have been eliminated. The Uniform Frame Selection, Dissimilarity-Based KFS, and the proposed algorithm are evaluated, and their performances are compared based on the accuracy and data size reduction. The results show that the Dissimilarity-Based KFS algorithm outperforms the Uniform Frame Selection algorithm in accuracy by 2%, and the proposed method shows a 3% and 5% improvement in accuracy compared to the first two algorithms, respectively. Furthermore, the proposed algorithm reduces the data size by 26%, making it computationally efficient.