Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning

Arafat, Syed Yasser; Iqbal, Muhammad

doi:10.1109/access.2020.2994214

Cited by 52 publications

(26 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To compare the outcomes, Daniyal et al presented the Bluechet and Puchserver models. Arafat et al [81] introduce Hijja, a novel dataset of Arabic letters produced only by youngsters aged 7-12. A total of 591 individuals contributed 47,434 characters to our dataset.…”

Section: Literature Reviewmentioning

confidence: 99%

Attention-Based CNN-RNN Arabic Text Recognition from Natural Scene Images

et al. 2021

View full text Add to dashboard Cite

According to statistics, there are 422 million speakers of the Arabic language. Islam is the second-largest religion in the world, and its followers constitute approximately 25% of the world’s population. Since the Holy Quran is in Arabic, nearly all Muslims understand the Arabic language per some analytical information. Many countries have Arabic as their native and official language as well. In recent years, the number of internet users speaking the Arabic language has been increased, but there is very little work on it due to some complications. It is challenging to build a robust recognition system (RS) for cursive nature languages such as Arabic. These challenges become more complex if there are variations in text size, fonts, colors, orientation, lighting conditions, noise within a dataset, etc. To deal with them, deep learning models show noticeable results on data modeling and can handle large datasets. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) can select good features and follow the sequential data learning technique. These two neural networks offer impressive results in many research areas such as text recognition, voice recognition, several tasks of Natural Language Processing (NLP), and others. This paper presents a CNN-RNN model with an attention mechanism for Arabic image text recognition. The model takes an input image and generates feature sequences through a CNN. These sequences are transferred to a bidirectional RNN to obtain feature sequences in order. The bidirectional RNN can miss some preprocessing of text segmentation. Therefore, a bidirectional RNN with an attention mechanism is used to generate output, enabling the model to select relevant information from the feature sequences. An attention mechanism implements end-to-end training through a standard backpropagation algorithm.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Attention-Based CNN-RNN Arabic Text Recognition from Natural Scene Images

et al. 2021

View full text Add to dashboard Cite

show abstract

“…They have been adapted for feature extraction in many text recognition systems. We can cite for example: scene text recognition [12], [13], video text recognition [14], and offline handwriting text recognition [15]- [17]. However, CNN-based or DL-based approaches are still deficient.…”

Section: ) Feature Extractionmentioning

confidence: 99%

Deep Sparse Auto-Encoder Features Learning for Arabic Text Recognition

et al. 2021

View full text Add to dashboard Cite

One of the most recent challenging issues of pattern recognition and artificial intelligence is Arabic text recognition. This research topic is still a pervasive and unaddressed research field, because of several factors. Complications arise due to the cursive nature of the Arabic writing, character similarities, unlimited vocabulary, use of multi-size and mixed-fonts, etc. To handle these challenges, an automatic Arabic text recognition requires building a robust system by computing discriminative features and applying a rigorous classifier together to achieve an improved performance. In this work, we introduce a new deep learning based system that recognizes Arabic text contained in images. We propose a novel hybrid network, combining a Bag-of-Feature (BoF) framework for feature extraction based on a deep Sparse Auto-Encoder (SAE), and Hidden Markov Models (HMMs), for sequence recognition. Our proposed system, termed BoF-deep SAE-HMM, is tested on four datasets, namely the printed Arabic line images Printed KHATT (P-KHATT), the benchmark printed word images Arabic Printed Text Image (APTI), the benchmark handwritten Arabic word images IFN/ENIT, and the benchmark handwritten digits images Modified National Institute of Standards and Technology (MNIST).

show abstract

“…The deep learning models of non‐spotting in natural and video images can be classified further into regression/anchor based models (Arafat & Iqbal, 2020; Chandio & Pickering, 2019; Huang & Xu, 2019) and segmentation based models (Cai et al, 2020; Dai et al, 2020; Guo et al, 2020). The former class considers the whole text as an object for detection while the latter class considers merging pixel by pixel or character by character for text detection in natural and video images.…”

Section: Non‐spotting Based Mining Approachesmentioning

confidence: 99%

Mining text from natural scene and video images: A survey

Shivakumara

Alaei

Pal

2021

WIREs Data Min & Knowl

View full text Add to dashboard Cite

In computer terminology, mining is considered as extracting meaningful information or knowledge from a large amount of data/information using computers. The meaningful information can be extracted from normal text, and images obtained from different resources, such as natural scene images, video, and documents by deriving semantics from text and content of the images. Although there are many pieces of work on text/data mining and several survey/review papers are published in the literature, to the best of our knowledge there is no survey paper on mining textual information from the natural scene, video, and document images considering word spotting techniques. In this article, we, therefore, provide a comprehensive review of both the non‐spotting and spotting based mining techniques. The mining approaches are categorized as feature, learning and hybrid‐based methods to analyze the strengths and limitations of the models of each category. In addition, it also discusses the usefulness of the methods according to different situations and applications. Furthermore, based on the review of different mining approaches, this article identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions. We believe such a review article will be useful to the researchers to quickly become familiar with the state‐of‐the‐art information and progresses made toward mining textual information from natural scene and video images. This article is categorized under: Algorithmic Development > Text Mining

show abstract

Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning

Cited by 52 publications

References 53 publications

Attention-Based CNN-RNN Arabic Text Recognition from Natural Scene Images

Attention-Based CNN-RNN Arabic Text Recognition from Natural Scene Images

Deep Sparse Auto-Encoder Features Learning for Arabic Text Recognition

Mining text from natural scene and video images: A survey

Contact Info

Product

Resources

About