2006
DOI: 10.1007/11669487_25
|View full text |Cite
|
Sign up to set email alerts
|

Towards Versatile Document Analysis Systems

Abstract: The research goal of highly versatile document analysis systems, capable of performing useful functions on the great majority of document images, seems to be receding, even in the face of decades of research. One family of nearly universally applicable capabilities includes document image content extraction tools able to locate regions containing handwriting, machine-print text, graphics, line-art, logos, photographs, noise, etc. To solve this problem in its full generality requires coping with a vast diversit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2007
2007
2017
2017

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 8 publications
(7 reference statements)
0
12
0
Order By: Relevance
“…We conduct experiments in the document image content extraction framework [10,1]. In this framework, each pixel is treated as a sample.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We conduct experiments in the document image content extraction framework [10,1]. In this framework, each pixel is treated as a sample.…”
Section: Methodsmentioning
confidence: 99%
“…We conducted the experiments in a document image segmentation framework, by trying to classify pixels into machine-print or handwriting [10,1]. A sequence of six classifiers trained with the augmented feature sets exhibits a monotonically decreasing error rate.…”
Section: Introductionmentioning
confidence: 99%
“…For example, Baird and Casey report that the three CD-ROM University of Washington database cost about two million dollars [7][20]. Copyright considerations can be a barrier to making a data set available e.g.…”
Section: Issue 10mentioning
confidence: 99%
“…We have applied our algorithm to a document image content extraction problem: finding regions containing machine-printed text, handwriting, photographs, etc in images of documents [4,5,2,1]. We select a small window around each pixel as our sample space and look for some linear combination of pixel values within this window as our new feature.…”
Section: Document Content Image Extraction (Dice)mentioning
confidence: 99%