Gary E. Kopec scite author profile

This paper describes a communication theory approach to document image recognition, patterned after the use of hidden Markov models in speech recognition. In general, a document recognition problem is viewed as consisting of three elements-an image generator, a noisy channel and an image decoder. A document image generator is a Markov source (stochastic finite-state automaton) that combines a message source with an imager. The message source produces a string of symbols, or text, that contains the information to be transmitted. The imager is modeled as a finite-state transducer that converts the one-dimensional message string into an ideal two-dimensional bitmap. The channel transforms the ideal image into a noisy observed image. The decoder estimates the message, given the observed image, by finding the a posteriori most probable path through the combined source and channel models using a Viterbi-like dynamic programming algorithm. The proposed approach is illustrated on the problem of decoding scanned telephone yellow pages to extract names and numbers from the listings. A finite-state model for yellow page columns was constructed and used to decode a database of scanned column images containing about 1100 individual listings. Overall, 99.5% of the listings were correctly recognized, with character classification rates of 98% and 99.6%, respectively, for the names and numbers.

show abstract

<title>Measuring document image skew and orientation</title>

Bloomberg

Kopec

Dasari

1995

View full text Add to dashboard Cite

Several approaches have previously been taken for identifying document image skew. At issue are efficiency, accuracy, and robustness. We work directly with the image, maximizing a function of the number of ON pixels in a scanline. Image rotation is simulated by either vertical shear or accumulation of pixel counts along sloped lines. Pixel sum differences on adjacent scanlines reduce isotropic background noise from non-text regions. To find the skew angle, a succession of values of this function are found. Angles are chosen hierarchically, typically with both a coarse sweep and a fine angular bifurcation. To increase efficiency, measurements are made on subsampled images that have been pre-filtered to maximize sensitivity to image skew. Results are given for a large set of images, including multiple and unaligned text columns, graphics and large area halftones. The measured intrinsic angular error is inversely proportional to the number of sampling points on a scanline. This method does not indicate when text is upside-down, and it also requires sampling the function at 90 degrees of rotation to measure text skew in landscape mode. However, such text orientation can be determined (as one of four directions) by noting that roman characters in all languages have many more ascenders than descenders, and using morphological operations to identify such pixels. Only a small amount of text is required for accurate statistical determination of orientation, and images without text are identified as such.

show abstract

Document image decoding using Markov source models

Kopec

Chou

1993

View full text Add to dashboard Cite

This paper describes a communication theory approach to document image recognition, pattemed after the use of hidden Markov models in speech recognition. A document recognition problem is viewed as consisting of three elementsan image generator, a noisy channel and an image decoder. A document image generator is a Markov source which combines a message source with an imager. The message source produces a string of symbols which contains the information to be transmitmi. The imager is modeled as a finite-state transducer which converts the message into an ideal bitmap. The channel transforms the ideal image into a noisy observed image. The decoder estimates the message from the observedimage by finding the aposteriori mostprobablepath through the combined source and channel models using a Viterbi-like algorithm. Application of the proposed method to decoding telephone yellow pages is described.

show abstract

Speech analysis homomorphic prediction

Kopec

Oppenheim

Tribolet

1977

IEEE Trans. Acoust., Speech, Signal Process.

View full text Add to dashboard Cite

Phase in speech and pictures

Oppenheim

Lim

Kopec

et al.

View full text Add to dashboard Cite

Signal analysis by homomorphic prediction

Oppenheim

Kopec²,

Tribolet³

1976

IEEE Trans. Acoust., Speech, Signal Process.

View full text Add to dashboard Cite

Formant tracking using hidden Markov models and vector quantization

Kopec

1986

IEEE Trans. Acoust., Speech, Signal Process.

View full text Add to dashboard Cite

Supervised template estimation for document image decoding

Kopec

Lomelin

1997

IEEE Trans. Pattern Anal. Machine Intell.

View full text Add to dashboard Cite

An approach to supervised training of character templates from page images and unaligned transcriptions is proposed. The template training problem is formulated as one of constrained maximum likelihood parameter estimation within the document image decoding framework. This leads to a three-phase iterative training algorithm consisting of transcription alignment, aligned template estimation (ATE) and channel estimation steps. The maximum likelihood ATE problem is shown to be NP-complete and thus an approximate solution approach is developed. An evaluation of the training procedure in a document-specific decoding task using the Univ. of Washington UW-II database of scanned technical journal articles is described.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gary E. Kopec

Document image decoding using Markov source models

<title>Measuring document image skew and orientation</title>

Document image decoding using Markov source models

Speech analysis homomorphic prediction

Phase in speech and pictures

Signal analysis by homomorphic prediction

Formant tracking using hidden Markov models and vector quantization

Supervised template estimation for document image decoding

Contact Info

Product

Resources

About