In basic education, timely and accurate grasp of students’ classroom learning status can provide real-time information reference and overall evaluation for teachers and managers, which has a very important educational application value. At present, a lot of information technology is applied in the analysis of classroom student behavior state, and the state analysis technology based on a classroom video has the characteristics of strong timeliness, wide dimension, and large capacity, which is especially suitable for the analysis and acquisition of students’ classroom state, and attracts the attention of major educational technology companies. However, the current student state acquisition technology based on video analysis lacks large scenes and has low practicability, and finally, the video-based student classroom behavior state analysis technology mainly focuses on a single behavior feature, which cannot fully reflect the student’s classroom behavior state. In view of the above problems, this study introduces the face recognition algorithm based on a student classroom video and its implementation process, improves the hybrid face detection model based on a traditional model, and proposes the neural network algorithm of student expression recognition based on a visual transformer. The experimental results show that the proposed algorithm based on students' classroom videos can effectively detect students’ attention and emotional state in class.