Speaker identi cation is a recognition problem that entails identifying a spokesperson from a set of consecutive time-series information. Because voice is a continuous one-dimensional time-stream, the majority of recent experimental techniques use convolutional neural networks (CNNs) or deep neural networks (DNNs). Because of the spectrogram in audio, the spatial attributes of utterances (which correlates to the speech spectra) and CNN are appropriate for spatial characteristic extraction. Simultaneously, the signal is time-series data, and DNN can better capture extended speeches than deep models. This work presents a DNN model for speaker identi cation using a jump-connected onedimensional convolutional neural network (1-D CNN) with a focus module (FM). The 1-D convolutional layer integrated with FM is employed in the presented model for speaker characteristic extraction and lessens heterogeneity in the temporal and spatial domains, allowing for quicker layer processing. Furthermore, the layered CNN hopping interconnection is employed to overcome the connectivity glitches, and a solution based on softmax loss and smooth L1-norm combined regulation is presented to increase e ciency. The ELSDSR, TIMIT, NIST, 16000PCM, and experimental audio datasets were used to test the suggested network model. The Equal Error Rate (EER) of end-to-end CNN for voiceprint identi cation is enhanced by 9.02% when assessed to baseline approaches, according to the experimental data. Our proposed DNN model, which we term the deep FM-1D CNN, had a high recognition accuracy of 99.21 percent in the experiments.At the same time, the observations show that the suggested network model outperforms other models in terms of robustness. This method could be used for other types of research, such as language modelling, with further optimization.