The first part of this paper reports a comparative study of the document classifications produced by the use of the single linkage, complete linkage, group average, and Ward clustering methods. Studies of cluster membership and of the effectiveness of cluster searches support previous findings that suggest that the single linkage classifications are rather different from those produced by the other three methods. These latter methods all produce large numbers of small clusters containing just pairs of documents. This finding motivates the work reported in the second part of the paper, which considers the use of clusters consisting of a document together with that document with which it is most similar. A comparison of the use of such clusters with conventional best match searches using seven document test collections suggests that the two types of search are of comparable effectiveness, but they retrieve noticeably different sets of relevant documents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.