Maarten de Rijke scite author profile

Searching an organization's document repositories for experts provides a cost effective solution for the task of expert finding. We present two general strategies to expert searching given a document collection which are formalized using generative probabilistic models. The first of these directly models an expert's knowledge based on the documents that they are associated with, whilst the second locates documents on topic, and then finds the associated expert. Forming reliable associations is crucial to the performance of expert finding systems. Consequently, in our evaluation we compare the different approaches, exploring a variety of associations along with other operational parameters (such as topicality). Using the TREC Enterprise corpora, we show that the second strategy consistently outperforms the first. A comparison against other unsupervised techniques, reveals that our second model delivers excellent performance

show abstract

Overview of RepLab 2013: Evaluating Online Reputation Monitoring Systems

Amigó

Carrillo-de-Albornoz

Chugur

et al. 2013

View full text Add to dashboard Cite

Predicting the volume of comments on online news stories

2009

View full text Add to dashboard Cite

On-line news agents provide commenting facilities for readers to express their views with regard to news stories. The number of user supplied comments on a news article may be indicative of its importance or impact. We report on exploratory work that predicts the comment volume of news articles prior to publication using five feature sets. We address the prediction task as a two stage classification task: a binary classification identifies articles with the potential to receive comments, and a second binary classification receives the output from the first step to label articles "low" or "high" comment volume. The results show solid performance for the former task, while performance degrades for the latter.

show abstract

Semantic characterizations of navigational XPath

Marx

Rijke

2005

SIGMOD Rec.

View full text Add to dashboard Cite

We give semantic characterizations of the expressive power of navigational XPath (a.k.a. Core XPath) in terms of first order logic. XPath can be used to specify sets of nodes and sets of paths in an XML document tree. We consider both uses. For sets of nodes, XPath is equally expressive as first order logic in two variables. For paths, XPath can be defined using four simple connectives, which together yield the class of first order definable relations which are safe for bisimulation. Furthermore, we give a characterization of the XPath expressible paths in terms of conjunctive queries.

show abstract

A Study of Blog Search

Mishne

Rijke

2006

114

View full text Add to dashboard Cite

Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval

2012

View full text Add to dashboard Cite

As retrieval systems become more complex, learning to rank approaches are being developed to automatically tune their parameters. Using online learning to rank, retrieval systems can learn directly from implicit feedback inferred from user interactions. In such an online setting, algorithms must obtain feedback for effective learning while simultaneously utilizing what has already been learned to produce high quality results. We formulate this challenge as an exploration-exploitation dilemma and propose two methods for addressing it. By adding mechanisms for balancing exploration and exploitation during learning, each method extends a state-of-the-art learning to rank method, one based on listwise learning and the other on pairwise learning. Using a recently developed simulation framework that allows assessment of online performance, we empirically evaluate both methods. Our results show that balancing exploration and exploitation can substantially and significantly improve the online retrieval performance of both listwise and pairwise approaches. In addition, the results demonstrate that such a balance affects the two approaches in different ways, especially when user feedback is noisy, yielding new insights relevant to making online learning to rank effective in practice.Keywords Information retrieval Á Learning to rank Á Implicit feedback An earlier version of this article appeared in Hofmann et al. (2011a). In this substantially revised and extended version we introduce a novel approach for balancing exploration and exploitation that works with pairwise online learning approaches, and carefully evaluate this new approach. Comparisons with the earlier described algorithm for listwise approaches yield new insights into the behavior of the two types of approach in online settings, especially how they compare in the face of noisy feedback and how they react to a balance of exploration and exploitation.

show abstract

PDL for ordered trees

Afanasiev

Blackburn

Dimitriou

et al. 2005

Journal of Applied Non-Classical Logics

View full text Add to dashboard Cite

Credibility-inspired ranking for blog post retrieval

Weerkamp

Rijke

2012

Inf Retrieval

View full text Add to dashboard Cite

Credibility of information refers to its believability or the believability of its sources. We explore the impact of credibility-inspired indicators on the task of blog post retrieval, following the intuition that more credible blog posts are preferred by searchers. Based on a previously introduced credibility framework for blogs, we define several credibility indicators, and divide them into post-level (e.g., spelling, timeliness, document length) and blog-level (e.g., regularity, expertise, comments) indicators. The retrieval task at hand is precision-oriented, and we hypothesize that the use of credibility-inspired indicators will positively impact precision. We propose to use ideas from the credibility framework in a reranking approach to the blog post retrieval problem: We introduce two simple ways of reranking the top n of an initial run. The first approach, Credibility-inspired reranking, simply reranks the top n of a baseline based on the credibility-inspired score. The second approach, Combined reranking, multiplies the credibility-inspired score of the top n results by their retrieval score, and reranks based on this score. Results show that Credibility-inspired reranking leads to larger improvements over the baseline than Combined reranking, but both approaches are capable of improving over an already strong baseline. For Credibility-inspired reranking the best performance is achieved using a combination of all post-level indicators. Combined reranking works best using the postlevel indicators combined with comments and pronouns. The blog-level indicators expertise, regularity, and coherence do not contribute positively to the performance, although analysis shows that they can be useful for certain topics. Additional analysis shows that a relative small value of n (15-25) leads to the best results, and that posts that 123Inf Retrieval (2012) 15:243-277 DOI 10.1007 move up the ranking due to the integration of reranking based on credibility-inspired indicators do indeed appear to be more credible than the ones that go down.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Maarten de Rijke

Formal models for expert finding in enterprise corpora

Overview of RepLab 2013: Evaluating Online Reputation Monitoring Systems

Predicting the volume of comments on online news stories

Semantic characterizations of navigational XPath

A Study of Blog Search

Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval

PDL for ordered trees

Credibility-inspired ranking for blog post retrieval

Contact Info

Product

Resources

About