In this article we investigate the use of signature files in Chinese information retrieval system and propose a new partitioning method for Chinese signature file based on the characteristic of Chinese words. Our partitioning method, called Partitioned Signature File for Chinese (PSFC), offers faster search efficiency than the traditional single signature file approach. We devise a general scheme for controlling the trade‐off between the false drop and storage overhead while maintaining the search space reduction in PSFC. An analytical study is presented to support the claims of our method. We also propose two new hashing methods for Chinese signature files so that the signature file will be more suitable for dynamic environment while the retrieval performance is maintained. Furthermore, we have implemented PSFC and the new hashing methods, and we evaluated them using a large‐scale real‐world Chinese document corpus, namely, the TREC‐5 (Text REtrieval Conference) Chinese collection. The experimental results confirm the features of PSFC and demonstrate its superiority over the traditional single signature file method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.