In the domain of education, timely intervention through early prediction of student grades holds critical significance. This is due to challenges such as learning difficulties, exemplified by the prevention of dropout, and the facilitation of individualized learning. To address these issues, nonlinear dynamic systems, notably Recurrent Neural Networks (RNNs), have demonstrated efficacy in unraveling the intricate relationships within student performance data, surpassing the constraints of traditional time series methods. However, the challenge of vanishing gradient issues hampers RNNs, leading to a significant decrease in gradient values during weight matrix multiplication. To solve this challenge, we introduce an innovative loss function, the MSECosine loss function crafted by seamlessly combining two established loss functions: Mean Square Error (MSE) and LogCosh. The incorporation of logarithmic terms in the MSE function mitigates error escalation, countering its tendency to amplify errors through square terms. The primary focus of this study is to propose and demonstrate the efficacy of the MSECosine loss function. In assessing the performance of this novel loss function, we employed two selfcollected datasets comprising learning management system (LMS) and assessment records. These datasets serve as the testing ground for four deep time series models: Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory network (LSTM), and CNN-LSTM. Employing 29 meticulously designed feature selection models, LSTM emerges as the preeminent model. Building on this groundwork, we boost the LSTM model's performance by integrating the proposed MSECosine loss function, resulting in an enhanced model termed eLSTM. Experimental results underscore the noteworthy achievements of the eLSTM model, emphasizing an accuracy of 0.6191% and a substantially reduced error rate of 0.1738. These outcomes surpass those of alternative approaches, highlighting the instrumental role of the MSECosine loss function in refining eLSTM for more accurate predictions in early student grade forecasting.