Prior search result diversification work focuses on achieving topical variety in a ranked list, typically equally across all aspects. In this paper, we diversify with sentiments according to an explicit bias. We want to allow users to switch the result perspective to better grasp the polarity of opinionated content, such as during a literature review. For this, we first infer the prior sentiment bias inherent in a controversial topic -the 'Topic Sentiment'. Then, we utilize this information in 3 different ways to diversify results according to various sentiment biases: (1) Equal diversification to achieve a balanced and unbiased representation of all sentiments on the topic; (2) Diversification towards the Topic Sentiment, in which the actual sentiment bias in the topic is mirrored to emphasize the general perception of the topic; (3) Diversification against the Topic Sentiment, in which documents about the 'minority' or outlying sentiment(s) are boosted and those with the popular sentiment are demoted.Since sentiment classification is an essential tool for this task, we experiment by gradually degrading the accuracy of a perfect classifier down to 40%, and show which diversification approaches prove most stable in this setting. The results reveal that the proportionality-based methods and our SCSF model, considering sentiment strength and frequency in the diversified list, yield the highest gains. Further, in case the Topic Sentiment cannot be reliably estimated, we show how performance is affected by equal diversification when actually an emphasis either towards or against the Topic Sentiment is desired: in the former case, an average of 6.48% is lost across all evaluation measures, whereas in the latter case this is 16.23%, confirming that bias-specific sentiment diversification is crucial.
Abstract. Passage Retrieval is a crucial step in question answering systems, one that has been well researched in the past. Due to the vocabulary mismatch problem and independence assumption of bag-of-words retrieval models, correct passages are often ranked lower than other incorrect passages in the retrieved list. Whereas in previous work, passages are reranked only on the basis of syntactic structures of questions and answers, our method achieves a better ranking by aligning the syntactic structures based on the question's answer type and detected named entities in the candidate passage. We compare our technique with strong retrieval and reranking baselines. Experimental results using the TREC QA 1999-2003 datasets show that our method significantly outperforms the baselines over all ranks in terms of the MRR measure.
Reading congressional legislation, also known as bills, is often tedious because bills tend to be long and written in complex language. In IBM Many Bills, an interactive web-based visualization of legislation, users of different backgrounds can browse bills and quickly explore parts that are of interest to them. One task users have is to be able to locate sections that don't seem to fit with the overall topic of the bill. In this paper, we present novel techniques to determine which sections within a bill are likely to be outliers by employing approaches from information retrieval. The most promising techniques first detect the most topically relevant parts of a bill by ranking its sections, followed by a comparison between these topically relevant parts and the remaining sections in the bill. To compare sections we use various dissimilarity metrics based on Kullback-Leibler Divergence. The results indicate that these techniques are more successful than a classification based approach. Finally, we analyze how the dissimilarity metrics succeed in discriminating between sections that are strong outliers versus those that are 'milder' outliers.
It is well known that clickthrough data can be used to improve the effectiveness of search results: broadly speaking, a query's past clicks are a predictor of future clicks on documents. However, when a new or unusual query appears, or when a system is not as widely used as a mainstream web search system, there may be little to no click data available to improve the results. Existing methods to boost query performance for sparse queries extend the query-document click relationship to more documents or queries, but require substantial clickthrough data from other queries.In this work we describe a way to boost rarely-clicked queries in a system where limited clickthrough data is available for all queries. We describe a probabilistic approach for carrying out that estimation and use it to rerank retrieved documents. We utilize information from co-click queries, subset queries, and synonym queries to estimate the clickthrough for a sparse query. Our experiments on a query log from a medical informatics company demonstrate that when overall clickthrough data is sparse, reranking search results using clickthrough information from related queries significantly outperforms reranking that employs clickthrough information from the query alone.
In this work, we address coreference retrieval, which involves identifying aliases that are distinct references to an entity. We begin with a known alias and discover unknown aliases that refer to the same entity. We use Entity Language Models to capture the contextual language around the known alias, which aids in finding new aliases. We also show that modeling the significant dates of the known aliases improves alias discovery performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.