2001
DOI: 10.1007/pl00013569
|View full text |Cite
|
Sign up to set email alerts
|

Transforming paper documents into XML format with WISDOM++

Abstract: Abstract. The transformation of scanned paper documents to a form suitable for an Internet browser is a complex process that requires solutions to several problems.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0
3

Year Published

2001
2001
2006
2006

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 61 publications
(33 citation statements)
references
References 21 publications
0
28
0
3
Order By: Relevance
“…1 is a document analysis system that can transform textual black and white paper documents into XML format [2]. This is a complex process involving several steps.…”
Section: Wisdom++mentioning
confidence: 99%
See 1 more Smart Citation
“…1 is a document analysis system that can transform textual black and white paper documents into XML format [2]. This is a complex process involving several steps.…”
Section: Wisdom++mentioning
confidence: 99%
“…By sorting the dictionary with respect to MaxTF(i,t), words occurring frequently only in one document might be favored. By sorting each class dictionary according to the product MaxTF(i,t)*PF(i,t) 2 , briefly denoted as MaxTF-PF 2 (Max Term FrequencySquare Page Frequency) measure, the effect of this phenomenon is kept under control. Moreover, common words used in documents of a given class will appear in the first entries of the corresponding class dictionary.…”
Section: The Feature Extractor Modulementioning
confidence: 99%
“…The document classification components of the WISDOM++ system (Altamura et al, 2001) are based on first-order learning algorithms (Esposito et al, 2000). Another advantage of such systems is their flexibility compared to the non-learning based systems.…”
Section: Introductionmentioning
confidence: 99%
“…Markup languages are a good example of representation means with such qualities. The system presented in (Worring and Smeulders, 1999) uses HTML as its final output form, while Altamura et al (2001) use XML. More abstract representations are labeled and weighted graphs.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper we present the multi-page DIA system WISDOM++ (http://www.di.uniba.it/~malerba/wisdom++/), whose architecture is knowledgebased and supports all the processing steps required for semantic indexing and storing in XML format [1]. More precisely, the transformation process performed by WISDOM++ consists of the preprocessing of the raster image of a scanned paper document, the segmentation of the preprocessed raster image into basic layout components, the classification of basic layout components according to the type of content (e.g., text, graphics, etc.…”
Section: Introductionmentioning
confidence: 99%