2012 International Conference on Frontiers in Handwriting Recognition 2012
DOI: 10.1109/icfhr.2012.227
|View full text |Cite
|
Sign up to set email alerts
|

Layout Analysis for Arabic Historical Document Images Using Machine Learning

Abstract: Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0
1

Year Published

2013
2013
2024
2024

Publication Types

Select...
5
4
1

Relationship

2
8

Authors

Journals

citations
Cited by 54 publications
(35 citation statements)
references
References 15 publications
0
34
0
1
Order By: Relevance
“…Syed Saqib Bukhari et al 10 presented the application of a document management system to handle the problem of the automatic processing of documents available in film archives. They used machine learning algorithms for segmenting side notes from the main body text for complex Arabic documents.…”
Section: Requirements and Related Workmentioning
confidence: 99%
“…Syed Saqib Bukhari et al 10 presented the application of a document management system to handle the problem of the automatic processing of documents available in film archives. They used machine learning algorithms for segmenting side notes from the main body text for complex Arabic documents.…”
Section: Requirements and Related Workmentioning
confidence: 99%
“…Artificial Neural Networks were further tested on Arabic document layout analysis schemes. Bukhari et al [14] differentiated the central body and the side manuscript by applying the Multilayer Perceptron (MLP) classifier. A dataset is created which includes 38 historical document images and they achieved 95% classification accuracy.…”
Section: Related Workmentioning
confidence: 99%
“…Page layout analysis methods can be categorized into three classes: granular-based, block-based and texture-based methods. Granular-based techniques [7]- [12] group basic layout entities of the page, e.g., pixels or connected components, to form larger homogeneous regions. Block-based approaches [13]- [15] segment the image into regions and subsequent splitting and merging steps are applied until yielding homogeneous regions.…”
Section: Introductionmentioning
confidence: 99%