Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news video

Zayene, Oussama; Touj, Sameh Masmoudi; Hennebert, Jean; Ingold, Rolf; Amara, Najoua Essoukri Ben

doi:10.1049/iet-cvi.2017.0468

Cited by 35 publications

(19 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is however important to recall that UPTI contains synthetically generated text lines and do not offer the same kind of recognition challenges as those encountered in case of scanned documents or video text. In case of video text, Zayene et al [114] reported 96.85% recognition rate of on a relatively smaller set of around 8000 Arabic text lines. For Urdu caption text, Tayyab et al [134] reports 93% recognition rate on approximately 20,000 text lines.…”

Section: Text Recognition Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Detection and recognition of cursive text from video frames

Mirza

Zeshan

Atif

et al. 2020

J Image Video Proc.

View full text Add to dashboard Cite

Textual content appearing in videos represents an interesting index for semantic retrieval of videos (from archives), generation of alerts (live streams), as well as high level applications like opinion mining and content summarization. The key components of such systems require detection and recognition of textual content which also make the subject of our study. This paper presents a comprehensive framework for detection and recognition of textual content in video frames. More specifically, we target cursive scripts taking Urdu text as a case study. Detection of textual regions in video frames is carried out by fine-tuning deep neural networks based object detectors for the specific case of text detection. Script of the detected textual content is identified using convoluational neural networks (CNNs), while for recognition, we propose a UrduNet, a combination of CNNs and long short-term memory (LSTM) networks. A benchmark dataset containing cursive text with more than 13,000 video frame is also developed. A comprehensive series of experiments is carried out reporting an F-measure of 88.3% for detection while a recognition rate of 87%.

show abstract

Section: Text Recognition Resultsmentioning

confidence: 99%

“…In the context of cursive text, a holistic technique based on multi-dimensional LSTMs is presented in [114] for recognition of Arabic video text. The technique is evaluated on two datasets ACTiV [115] and the ALIF [116,117] and reports high recognition rates.…”

Section: Text Recognitionmentioning

confidence: 99%

Detection and recognition of cursive text from video frames

Mirza

Zeshan

Atif

et al. 2020

J Image Video Proc.

View full text Add to dashboard Cite

show abstract

“…In recent years, several novel works for cursive text detection and recognition in video images have been developed [51]- [54], while a limited work is presented for cursive text recognition in natural scenes [55]- [57]. Ahmed et al [55], modified the maximally stable extremal region method to extract the scale-invariant features and passed to the multi-dimensional long short term memory (MDLSTM) classifier.…”

Section: B Cursive Text Recognition In Video and Natural Scene Imagesmentioning

confidence: 99%

Cursive Character Recognition in Natural Scene Images Using a Multilevel Convolutional Neural Network Fusion

2020

View full text Add to dashboard Cite

The accuracy of current natural scene text recognition algorithms is limited by the poor performance of character recognition methods for these images. The complex backgrounds, variations in the writing, text size, orientations, low resolution and multi-language text make recognition of text in natural images a complex and challenging task. Conventional machine learning and deep learning-based methods have been developed that have achieved satisfactory results, but character recognition for cursive text such as Arabic and Urdu scripts in natural images is still an open research problem. The characters in the cursive text are connected and are difficult to segment for recognition. Variations in the shape of a character due to its different positions within a word make the recognition task more challenging than non-cursive text. Optical character recognition (OCR) techniques proposed for Arabic and Urdu scanned documents perform very poorly when applied to character recognition in natural images. In this paper, we propose a multiscale feature aggregation (MSFA) and a multi-level feature fusion (MLFF) network architecture to recognize isolated Urdu characters in natural images. The network first aggregates multi-scale features of the convolutional layers by up-sampling and addition operations and then combines them with the high-level features. Finally, the outputs of the MSFA and MLFF networks are fused together to create more robust and powerful features. A comprehensive dataset of segmented Urdu characters is developed for the evaluation of the proposed network models. Synthetic text on the patches of images with real natural scene backgrounds is generated to increase the samples of infrequently used characters. The proposed model is evaluated on the Chars74K and ICDAR03 datasets. To validate the proposed model on the new Urdu character image dataset, we compare its performance with the histogram of oriented gradients (HoG) method. The experimental results show that the aggregation of multi-scale and multilevel features and their fusion is more effective, and outperforms other methods on the Urdu character image and Chars74K datasets. INDEX TERMS Cursive text recognition, natural scene Urdu character recognition, multi-scale feature aggregation, multi-level feature fusion, convolutional neural network (CNN)

show abstract

“…These results have been compared with Sakhr, ABBYY, and NovoDynamics, which are known commercial Arabic OCR systems, and the results were promising. Zayene et al (2018b) presented an Arabic video embedded text recognition system based on deep learning approach, they used MDLSTM network as input layers, so the MDLSTM learn the features from the raw input image, for the output layer they use the CTC with softmax activation function. The suggested method has been trained and evaluated using the AcTiV-R database which is part of AcTiv dataset consists of 10,415 text-lines images, 44,583 words.…”

Section: Arabic Text Recognition With Deep Learningmentioning

confidence: 99%

A Review of Arabic Text Recognition Dataset

Al-Sheikh¹,

Mohd²,

Warlina³

2020

APJITM

View full text Add to dashboard Cite

Building a robust Optical Character Recognition (OCR) system for languages, such as Arabic with cursive scripts, has always been challenging. These challenges increase if the text contains diacritics of different sizes for characters and words. Apart from the complexity of the used font, these challenges must be addressed in recognizing the text of the Holy Quran. To solve these challenges, the OCR system would have to undergo different phases. Each problem would have to be addressed using different approaches, thus, researchers are studying these challenges and proposing various solutions. This has motivate this study to review Arabic OCR dataset because the dataset plays a major role in determining the nature of the OCR systems. State-of-the-art approaches in segmentation and recognition are discovered with the implementation of Recurrent Neural Networks (Long Short-Term Memory-LSTM and Gated Recurrent Unit-GRU) with the use of the Connectionist Temporal Classification (CTC). This also includes deep learning model and implementation of GRU in the Arabic domain. This paper has contribute in profiling the Arabic text recognition dataset thus determining the nature of OCR system developed and has identified research direction in building Arabic text recognition dataset.

show abstract

Multi‐dimensional long short‐term memory networks for artificial Arabic text recognition in news video

Cited by 35 publications

References 50 publications

Detection and recognition of cursive text from video frames

Detection and recognition of cursive text from video frames

Cursive Character Recognition in Natural Scene Images Using a Multilevel Convolutional Neural Network Fusion

A Review of Arabic Text Recognition Dataset

Contact Info

Product

Resources

About