Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory

Khosrobeigi, Zohreh; Veisi, Hadi; Hoseinzade, Ehsan; Shabanian, Hanieh

doi:10.3390/app122211760

Cited by 3 publications

(1 citation statement)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [11], the authors proposed an Urdu Nastaliq Handwritten Dataset (UNHD), which is written by 500 writers on A4-size paper and is available on request (https://www.kaggle.com/datasets/drsaadbinahmed/ unhd-dataset, accessed on 28 March 2023). Khosrobeigi et al [36] also presented a Persian language dataset; this dataset is collected from different Persian-language new websites, and the description of the dataset is shown in Table 2; this dataset is split into 80% for training and 20% for testing purpose. There are some datasets available that are used for handwritten text recognition of Urdu, and, as we know, Urdu and Arabic use the same vocabulary and alphabet as well.…”

Section: Datasetmentioning

confidence: 99%

A Survey of OCR in Arabic Language: Applications, Techniques, and Challenges

et al. 2023

View full text Add to dashboard Cite

Optical character recognition (OCR) is the process of extracting handwritten or printed text from a scanned or printed image and converting it to a machine-readable form for further data processing, such as searching or editing. Automatic text extraction using OCR helps to digitize documents for improved productivity and accessibility and for preservation of historical documents. This paper provides a survey of the current state-of-the-art applications, techniques, and challenges in Arabic OCR. We present the existing methods for each step of the complete OCR process to identify the best-performing approach for improved results. This paper follows the keyword-search method for reviewing the articles related to Arabic OCR, including the backward and forward citations of the article. In addition to state-of-art techniques, this paper identifies research gaps and presents future directions for Arabic OCR.

show abstract

Section: Datasetmentioning

confidence: 99%

A Survey of OCR in Arabic Language: Applications, Techniques, and Challenges

et al. 2023

View full text Add to dashboard Cite

show abstract

Closing Editorial for Computer Vision and Pattern Recognition Based on Deep Learning

Yuan

2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

Investigating the Challenges and Opportunities in Persian Language Information Retrieval through Standardized Data Collections and Deep Learning

Moniri,

Schlosser,

Kowerko

2024

Computers

View full text Add to dashboard Cite

The Persian language, also known as Farsi, is distinguished by its intricate morphological richness, yet it contends with a paucity of linguistic resources. With an estimated 110 million speakers, it finds prevalence across Iran, Tajikistan, Uzbekistan, Iraq, Russia, Azerbaijan, and Afghanistan. However, despite its widespread usage, scholarly investigations into Persian document retrieval remain notably scarce. This circumstance is primarily attributed to the absence of standardized test collections, which impedes the advancement of comprehensive research endeavors within this realm. As data corpora are the foundation of natural language processing applications, this work aims at Persian language datasets to address their availability and structure. Subsequently, we motivate a learning-based framework for the processing of Persian texts and their recognition, for which current state-of-the-art approaches from deep learning, such as deep neural networks, are further discussed. Our investigations highlight the challenges of realizing such a system while emphasizing its possible benefits for an otherwise rarely covered language.

show abstract

Persian Optical Character Recognition Using Deep Bidirectional Long Short-Term Memory

Cited by 3 publications

References 49 publications

A Survey of OCR in Arabic Language: Applications, Techniques, and Challenges

A Survey of OCR in Arabic Language: Applications, Techniques, and Challenges

Closing Editorial for Computer Vision and Pattern Recognition Based on Deep Learning

Investigating the Challenges and Opportunities in Persian Language Information Retrieval through Standardized Data Collections and Deep Learning

Contact Info

Product

Resources

About