Human activity and action recognition are important in understanding human behavior and have always played a significant role in social communication. Human Action Recognition (HAR) is a challenging task that has various applications, such as intelligent video surveillance and human-computer interaction. Existing HAR models using video data often require high computational resources and a long processing time due to the high frame rate. However, using all frames to recognize a particular action is often unnecessary. Thus, this paper proposes an Efficient Human Action Recognition System (EHARS) using k-means clustering based keyframe extraction technique, which addresses the issue of computational inefficiency in existing models by eliminating redundant and useless frames. The proposed EHRAS utilizes the CNN-LSTM architecture where a pre-trained VGG-16 model is used for spatial feature extraction, and Bi-Directional Long Short-Term Memory (BiLSTM) network is used for classification, which incorporates temporal features. The proposed model is evaluated using the widely used benchmark UCF-101 dataset and achieved an accuracy of 97.65%, which is comparatively better than some of the existing models for the HAR task.