2004
DOI: 10.1007/978-3-540-28640-0_17
|View full text |Cite
|
Sign up to set email alerts
|

An Integrated Approach for Automatic Semantic Structure Extraction in Document Images

Abstract: Abstract. In this paper we present an integrated approach for semantic structure extraction in document images. Document images are initially processed to extract both their layout and logical structures on the base of geometrical and spatial information. Then, textual content of logical components is employed for automatic semantic labeling of layout structures. To support the whole process different machine learning techniques are applied. Experimental results on a set of biomedical multi-page documents are … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2006
2006
2010
2010

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 10 publications
(9 reference statements)
0
2
0
Order By: Relevance
“…Nowadays pattern recognition techniques are often used [3] and they allow to perform indexing starting from a first phase of knowledge extraction. This techniques assume the availability of scanned documents with a resolution enough high to perform some OCR, and it's not our case (there are several handy written documents for example) at least for the treatment of the paper documents.…”
Section: Related Workmentioning
confidence: 99%
“…Nowadays pattern recognition techniques are often used [3] and they allow to perform indexing starting from a first phase of knowledge extraction. This techniques assume the availability of scanned documents with a resolution enough high to perform some OCR, and it's not our case (there are several handy written documents for example) at least for the treatment of the paper documents.…”
Section: Related Workmentioning
confidence: 99%
“…The extraction of such physical layout information is traditionally concerned with scanned images (e.g. OCR) [3] [4], but it is difficult to extract the layout information from electronic documents and engineering drawings. In this paper, we propose a document analysis method, which extracts text and layout information from various documents.…”
Section: Introductionmentioning
confidence: 99%