Text recognition is critical in various domains, including driving assistance, handwriting recognition, and aiding the visually impaired. In recent years, deep learning-based methods have demonstrated outstanding performance in Scene Text Recognition (STR). However, STR poses significant challenges, and the scarcity of non-Latin language datasets further compounds these challenges. To address this, we collected a dataset of Persian digits, including 20000 images with different challenges, making the dataset appropriate for text recognition task. Furthermore, we propose a Convolutional-based model that incorporates the squeeze and excitation gate, forcing the model to focus on latent features, and connectionist temporal classification, enabling end-to-end sequence learning, for Persian digit recognition. We conduct extensive comparisons with different architectures and models to evaluate the performance of our proposed model. As a result, our approach achieves an accuracy of 94.26 on our datasets. The results demonstrate that our model outperforms the other methods, highlighting its effectiveness in Persian digit recognition.