2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017
DOI: 10.1109/icdar.2017.30
|View full text |Cite
|
Sign up to set email alerts
|

Impact of Ligature Coverage on Training Practical Urdu OCR Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
3
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 16 publications
0
3
0
Order By: Relevance
“…The dataset consists of approximately 10,000 text lines, which are insufficient to train a reasonably good handwriting text recognition engine.The standard method of increasing the training samples is to introduce data augmentation; however, we argue that data augmentation methods are not helpful in training a text recognition system. In[22], we showed that ligature coverage has a positive impact in improving the accuracy of a text recognition system. It is specifically true for Arabic like scripts where the number of ligatures are huge.…”
mentioning
confidence: 93%
“…The dataset consists of approximately 10,000 text lines, which are insufficient to train a reasonably good handwriting text recognition engine.The standard method of increasing the training samples is to introduce data augmentation; however, we argue that data augmentation methods are not helpful in training a text recognition system. In[22], we showed that ligature coverage has a positive impact in improving the accuracy of a text recognition system. It is specifically true for Arabic like scripts where the number of ligatures are huge.…”
mentioning
confidence: 93%
“…From the literature, we noticed that mostly the transfer learned networks are effectively smeared on the printed script rather than handwritten, and we also observe that CNN-based transfer learning is effectively applied to the non-cursive script like Chinese, Latin, Bangala, and Devanagari, etc. Relatively very few research have been concentrated on cursive scripts like Arabic [18], Urdu [19], and Farsi [20], where the experiment carried out on regular documents, and the network models are trained using specific benchmark datasets like UNHD [21], UPTI [22], EMILLE [23] and WordNet [24]. These datasets consist of handwritten text lines and word images written by various writers.…”
Section: Introductionmentioning
confidence: 99%
“…While for recognition of Urdu characters from outdoor images there are few custom datasets [11], [15], [25] and for recognition of printed characters words there is a famous dataset UPTI [24], which recently has been updated and has been presented with name UPTI2.0 [38] because the performance on UPTI has reached near saturation [33], [35]. There also exist CLE-18000 [32], [39] which contains near 18K ligatures (compound characters).…”
Section: Introductionmentioning
confidence: 99%