Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.
Abstract. Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
We live in a ''small world,'' where two arbitrary people are likely connected by a short chain of intermediate friends. With scant information about a target individual, people can successively forward a message along such a chain. Experimental studies have verified this property in real social networks, and theoretical models have been advanced to explain it. However, existing theoretical models have not been shown to capture behavior in real-world social networks. Here, we introduce a richer model relating geography and social-network friendship, in which the probability of befriending a particular person is inversely proportional to the number of closer people. In a large social network, we show that one-third of the friendships are independent of geography and the remainder exhibit the proposed relationship. Further, we prove analytically that short chains can be discovered in every network exhibiting the relationship.routing algorithms ͉ small worlds ͉ population networks ͉ rank-based friendships ͉ six degrees of separation
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.