Abstract. An open challenge in information distillation is the evaluation and optimization of the utility of ranked lists with respect to flexible user interactions over multiple sessions. Utility depends on both the relevance and novelty of documents, and the novelty in turn depends on the user interaction history. However, user behavior is non-deterministic. We propose a new probabilistic framework for stochastic modeling of user behavior when browsing multi-session ranked lists, and a novel approximation method for efficient computation of the expected utility over numerous user-interaction patterns. Using this framework, we present the first utility-based evaluation over multi-session search scenarios defined on the TDT4 corpus of news stories, using a state-of-the-art information distillation system. We demonstrate that the distillation system obtains a 56.6% utility enhancement by combining multi-session adaptive filtering with novelty detection and utility-based optimization of system parameters for optimal ranked list lengths.Key words: Multi-session distillation, utility evaluation based both on novelty and relevance, stochastic modeling of user browsing behavior.
This paper examines a new approach to information distillation over temporally ordered documents, and proposes a novel evaluation scheme for such a framework. It combines the strengths of and extends beyond conventional adaptive filtering, novelty detection and non-redundant passage ranking with respect to long-lasting information needs ('tasks' with multiple queries). Our approach supports fine-grained user feedback via highlighting of arbitrary spans of text, and leverages such information for utility optimization in adaptive settings. For our experiments, we defined hypothetical tasks based on news events in the TDT4 corpus, with multiple queries per task. Answer keys (nuggets) were generated for each query and a semiautomatic procedure was used for acquiring rules that allow automatically matching nuggets against system responses. We also propose an extension of the NDCG metric for assessing the utility of ranked passages as a combination of relevance and novelty. Our results show encouraging utility enhancements using the new approach, compared to the baseline systems without incremental learning or the novelty detection components.
Popular methods for probabilistic topic modeling like the Latent Dirichlet Allocation (LDA, [1]) and Correlated Topic Models (CTM, [2]) share an important property, i.e., using a common set of topics to model all the data. This property can be too restrictive for modeling complex data entries where multiple fields of heterogeneous data jointly provide rich information about each object or event. We propose a new extension of the CTM method to enable modeling with multi-field topics in a global graphical structure, and a mean-field variational algorithm to allow joint learning of multinomial topic models from discrete data and Gaussianstyle topic models for real-valued data. We conducted experiments with both simulated and real data, and observed that the multi-field CTM outperforms a conventional CTM in both likelihood maximization and perplexity reduction. A deeper analysis on the simulated data reveals that the superior performance is the result of successful discovery of the mapping among field-specific topics and observed data.
Many applications involve a set of prediction tasks that must be accomplished sequentially through user interaction. If the tasks are interdependent, the order in which they are performed may have a significant impact on the overall performance of the prediction systems. However, manual specification of an optimal order may be difficult when the interdependencies are complex, especially if the number of tasks is large, making exhaustive search intractable. This paper presents the first attempt at solving the optimal task ordering problem using an approximate formulation in terms of pairwise task order preferences, reducing the problem to the well-known Linear Ordering Problem. We propose two approaches for inducing the pairwise task order preferences -1) a classifier-agnostic approach based on conditional entropy that determines the prediction tasks whose correct labels lead to the least uncertainty for the remaining predictions, and 2) a classifier-dependent approach that empirically determines which tasks are favored before others for better predictive performance. We apply the proposed solutions to two practical applications that involve computer-assisted trouble report generation and document annotation, respectively. In both applications, the user fills up a series of fields and at each step, the system is expected to provide useful suggestions, which comprise the prediction (i.e. classification and ranking) tasks. Our experiments show encouraging improvements in predictive performance, as compared to approaches that do not take task dependencies into account.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.