Two textbase indexing methods enjoying wide applicability are the inverted index and the Superimposed Coding based Signature File (SC-SF). The former is most efficient in query processing, whereas the latter excels in storage utilization. Building on previous results, we propose a new hybrid structure (S-Index) which has a tunable performance. At the one extreme end, S-Index turns into a signature file with zero information loss, so that queries are processed faster than in ordinary SC-SF. At the other extreme end, S-Index turns into an inverted index. The advantage of the proposed access method is that the textbase index may now be tailored to the query profiles of user classes: for frequently queried textbase sections S-Index performs like an inverted index, whereas the bulk of the textbase is indexed in the form of a signature file. The S-Index structure is presented in detail, together with performance analysis results.
A new methodology is introduced, where blocks of text are replaced by a compressed, fully reversible, signature pattern. Full reversibility implies zero information loss, thus the new method is termed Perfect Encoding. The method's analytical model is produced and, where applicable, contrasted with the current practice in signature file organizations. Perfect Encoding is shown to represent optimal signature file performance with respect to: (a) information loss minimization, and (b) information compression maximization. In this respect, it can be considered as a framework for measuring the performance of signature file based information encoding structures. In addition, the new method has the potential to develop into a scheme which is alternative or complementary to inverted and signature file based systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.