Source camera identification can verify whether two videos were shot by the same device, which is of great significance in multimedia forensics. Most existing identification methods use convolutional neural networks to learn sensor noise patterns to identify the source camera in closed forensic scenarios. While these methodologies have achieved remarkable results, they are nonetheless constrained by two primary challenges: (1) the interference of semantic information and (2) the incongruity in feature distributions across different datasets. The former will interfere with the extraction of effective features of the model. The latter will cause the model to fit the characteristic distribution of the training data and be sensitive to unseen data features. To address these challenges, we propose a novel source camera identification framework that determines whether a video was shot by the same device by obtaining similarities between source camera features. Firstly, we extract video key frames and use the integral image to optimize the smoothing blocks selection algorithm of inter-pixel variance to remove the interference of video semantic information. Secondly, we design a residual neural network fused with a constraint layer to adaptively learn video source features. Thirdly, we introduce a triplet loss metric learning strategy to optimize the network model to improve the discriminability of the model. Finally, we design a multi-dimensional feature vector similarity fusion strategy to achieve highly generalized source camera recognition. Extensive experiments show that our method achieved an AUC value of up to 0.9714 in closed-set forensic scenarios and an AUC value of 0.882 in open-set scenarios, representing an improvement of 5% compared to the best baseline method. Furthermore, our method demonstrates effectiveness in the task of deepfake detection.