2001
DOI: 10.1007/pl00010904
|View full text |Cite
|
Sign up to set email alerts
|

Automatic document classification and indexing in high-volume applications

Abstract: In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described. This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes. The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically index heterog… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0
1

Year Published

2002
2002
2017
2017

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 32 publications
(21 citation statements)
references
References 12 publications
0
20
0
1
Order By: Relevance
“…However, whereas RLSA filtering has proved its efficiency in segmenting textual documents, its use in graphics-rich documents is less frequent; one of the few methods we are aware of is that of Lu [13]. Other methods used for text-rich documents include those based on white streams [18] and the top-down methods using some kind of X-Y decomposition of the document [1,16].…”
Section: Introductionmentioning
confidence: 99%
“…However, whereas RLSA filtering has proved its efficiency in segmenting textual documents, its use in graphics-rich documents is less frequent; one of the few methods we are aware of is that of Lu [13]. Other methods used for text-rich documents include those based on white streams [18] and the top-down methods using some kind of X-Y decomposition of the document [1,16].…”
Section: Introductionmentioning
confidence: 99%
“…The first one concerns data-based systems and the second one concerns model-based systems. Data-based systems are usually used in heterogeneous document flows and extract different information from documents, such as tables [26], graphical features such as logos and trademarks [27], or the general layout [23]. On the contrary, model-based systems are used in homogeneous document flows, where similar documents arrive generally one after the other [28][29][30][31].…”
Section: Document Image Similaritymentioning
confidence: 99%
“…The documents of the same class can share some interesting information such as the background color, the document layout, the position of the relevant information on the image, or metadata, such as the seller. Once one document is grouped into a class, an specific processing for extracting the desired information can be designed depending on these features [23]. A simple way to define a class consists in taking a representative image.…”
Section: Overviewmentioning
confidence: 99%
See 2 more Smart Citations