Image coding using Wavelet Transform, DCT and similar transform techniques is well established. On the other hand, these coding methods neither take into account the special characteristics of the images in a database nor are they suitable for fast database search. In this paper, the digital archiving of Ottoman printings is considered. Ottoman documents are printed in Arabic letters. In [1], Witten et al. describes a scheme based on finding the characters in binary document images and encoding the positions of the repeated characters. This method efficiently compresses document images and is suitable for database search, but it cannot be applied to Ottoman or Arabic documents as the concept of character is different in Ottoman or Arabic. Typically, one has to deal with compound structures consisting of a group of letters. Therefore, the matching criterion will be according to those compound structures. Furthermore, the text images are gray tone or color images for Ottoman scripts for the reasons that will be described in the paper. In our method the compound structure matching is carried out in wavelet domain which reduces the search space and increases the compression ratio. In addition to the wavelet transformation which corresponds to the linear subband decomposition, we also used nonlinear subband decomposition. The filters in the nonlinear subband decomposition have the property of preserving edges in the low resolution subband image.Keywords: Textual Image Coding, Document Imaging, Image Databases, Wavelet Transforms.
TEXTUAL IMAGE ARCHIVING AND COMPRESSIONAn Ottoman document image mainly consists of printed text and some marks and drawings. There can also be gray tone and color images inside the document. Usually, the marks, shady areas and ink smears in the page are important for a historian. As a result, the scanned image should be kept in gray tone or color format for Ottoman archives. the redundancy of these repetitions is the key step in most of the textual image coding algorithms. A good approach to take advantage of this redundancy is to encode the repeated character images and their locations. This method efficiently compresses the textual image and it is appropriate for fast database search. Since the character images are preserved, the keyword search is available via individual characters and their locations.The procedure of textual image coding can be described in a sequence as: 1) Find and extract a mark in the image, 2) add it to the library constructed by these mark images, 3) find the locations of the marks that are similar to the extracted one inside image, and remove those repetitions from the image, 4) go to 1 until all marks in the image are deleted, 5) compress (i) the constructed library and (ii) the symbol locations.This operation is illustrated in Fig. 1. The repetitions of letter "waw" are found. Note that the letter can be connected to other compound structures.SPIE Vol. 2727 / 569A further sixth step in this procedure is proposed in [1] to encode the residue image and p...