A major challenge in developing models for hypertext retrieval is to effectively combine content information with the link structure available in hypertext collections. Although several link-based ranking methods have been developed to improve retrieval results, none of them can fully exploit the discrimination power of contents as well as fully exploit all useful link structures. In this paper, we propose a general relevance propagation framework for combining content and link information. The framework gives a probabilistic score to each document defined based on a probabilistic surfing model. Two main characteristics of our framework are our probabilistic view on the relevance propagation model and propagation through multiple sets of neighbors. We compare eight different models derived from the probabilistic relevance propagation framework on two standard TREC Web test collections. Our results show that all the eight relevance propagation models can outperform the baseline content only ranking method for a wide range of parameter values, indicating that the relevance propagation framework provides a general, effective and robust way of exploiting link information. Our experiments also show that using multiple neighbor sets outperforms using just one type of neighbors significantly and taking a probabilistic view of propagation provides guidance on setting propagation parameters.
Query expansion is a method for alleviating the vocabulary mismatch problem present in information retrieval tasks. Previous works have shown that terms selected for query expansion by traditional methods such as pseudo-relevance feedback are not always helpful to the retrieval process. In this paper, we show that this is also true for more recently proposed embedding-based query expansion methods. We then introduce an artificial neural network classifier to predict the usefulness of query expansion terms. This classifier uses term word embeddings as inputs. We perform experiments on four TREC newswire and web collections show that using terms selected by the classifier for expansion significantly improves retrieval performance when compared to competitive baselines. The results are also shown to be more robust than the baselines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.