Proceedings of the Second International Conference on Systems Integration
DOI: 10.1109/icsi.1992.217295
|View full text |Cite
|
Sign up to set email alerts
|

A document segmentation, classification and recognition system

Abstract: This paper proposes a document segmentation, classification and recognition system for automatically reading daily-received ofice documents that have complex layout structures, such as multiple columns and mixed-mode contents of texts, graphics and half-tone pictures. First, the block segmentation employs a twostep run-length smoothing algorithm for decomposing any document into single-mode blocks. Next, based on cluslering rules the block classification classifies each block into one of text, horizonld or ver… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(7 citation statements)
references
References 21 publications
0
7
0
Order By: Relevance
“…Approaches used to tackle this problem can be divided into: top-down and bottomup. The most common top-down techniques are run-length smoothing [20]- [21] and projection profiles [22]- [25]. Topdown approaches are splitting image into regions which are later identified and further spitted into text columns then paragraphs, text lines and finally words [23].…”
Section: Text / Graphical Segmentationmentioning
confidence: 99%
“…Approaches used to tackle this problem can be divided into: top-down and bottomup. The most common top-down techniques are run-length smoothing [20]- [21] and projection profiles [22]- [25]. Topdown approaches are splitting image into regions which are later identified and further spitted into text columns then paragraphs, text lines and finally words [23].…”
Section: Text / Graphical Segmentationmentioning
confidence: 99%
“…Top-down approaches split the document into blocks, columns paragraphs, text lines, or even words. The most common top-down techniques are run-length smoothing [2], [3] and projection profiles [4]. These methods are not suitable for skewed texts because top-down methods are restricted only to rectangular blocks.…”
Section: Introductionmentioning
confidence: 99%
“…Tsai (1985) proposed an approach to automatic threshold selection. Some other systems based on the prior knowledge of some statistical properties of various blocks (Fisher et al, 1990;Akiyama and Hagita, 1990;Shih et al, 1992;Pavlidis and Zhou, 1992;Zlatopolsky, 1994), or texture analyses (Wang and Srihari, 1989;Jain and Bhattacharjee, 1992) have also been developed. Suen and Wang (1996) presented a text string extraction algorithm, which uses the edgedetection technique and text block identification to extract text strings.…”
Section: Introductionmentioning
confidence: 99%