The huge amount of storage needed for document images is a major hindrance to widespread use of document image processing (DIP) systems. Although current DIP systems store document images in compressed form, there is much room for improvement. In this paper, a nearly-lossless document image compression method is investigated which preserves the relevant information of a document. The proposed approach is based on the segmentation of a document image into different blocks that are classified into one of several block classes and compressed by a block-class-specific (BCS) data compression method. Whereas image and graphics blocks are compressed using standard image compression methods, text blocks are fed into a text and font recognition module and converted into their textual representation. Finally, text blocks are compressed by encoding their textual representation and enough formatting information to be able to render them as faithfully as possible to the original document.Preliminary results show that ( 1) the achievable compression ratios compare favourably with standard document image compression methods for all document images tested and (2) the quality of the decompressed image depends on the recognition accuracy of the text recognition module.
The number of operations in the coding part of adaptive arithmetic coding is independent of the number of symbols. The number of operations in a traditional implementation of the adaptive part, however, increases linearly with the number of symbols. Therefore, the adaptive updating of the model consumes the vast majority of computational operations if the number of symbols is large, as is typical in image coding. This paper presents a fast alternative of implementing the adaptive part in a hierarchical fashion so that the number of operations depends only logarithmically on the number of symbols.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.