Exploratory ad-hoc queries could return too many answers -a phenomenon commonly referred to as "information overload". In this paper, we propose to automatically categorize the results of SQL queries to address this problem. We dynamically generate a labeled, hierarchical category structure -users can determine whether a category is relevant or not by examining simply its label; she can then explore just the relevant categories and ignore the remaining ones, thereby reducing information overload. We first develop analytical models to estimate information overload faced by a user for a given exploration. Based on those models, we formulate the categorization problem as a cost optimization problem and develop heuristic algorithms to compute the min-cost categorization.
In addition to purely occurrence-based relevance models, term proximity has been frequently used to enhance retrieval quality of keyword-oriented retrieval systems. While there have been approaches on effective scoring functions that incorporate proximity, there has not been much work on algorithms or access methods for their efficient evaluation. This paper presents an efficient evaluation framework including a proximity scoring function integrated within a top-k query engine for text retrieval. We propose precomputed and materialized index structures that boost performance. The increased retrieval effectiveness and efficiency of our framework are demonstrated through extensive experiments on a very large text benchmark collection. In combination with static index pruning for the proximity lists, our algorithm achieves an improvement of two orders of magnitude compared to a term-based top-k evaluation, with a significantly improved result quality
Abstract. With the advent of ubiquitous computing, we can easily acquire the locations of moving objects. This paper studies clustering problems for trajectory data that is constrained by the road network. While many trajectory clustering algorithms have been proposed, they do not consider the spatial proximity of objects across the road network. For this kind of data, we propose a new distance measure that reflects the spatial proximity of vehicle trajectories on the road network, and an efficient clustering method that reduces the number of distance computations during the clustering process. Experimental results demonstrate that our proposed method correctly identifies clusters using real-life trajectory data yet reduces the distance computations by up to 80% against the baseline algorithm.
A commercial web search engine shards its index among many servers, and therefore the response time of a search query is dominated by the slowest server that processes the query. Prior approaches target improving responsiveness by reducing the tail latency of an individual search server. They predict query execution time, and if a query is predicted to be long-running, it runs in parallel, otherwise it runs sequentially. These approaches are, however, not accurate enough for reducing a high tail latency when responses are aggregated from many servers because this requires each server to reduce a substantially higher tail latency (e.g., the 99.99th-percentile), which we call extreme tail latency.We propose a prediction framework to reduce the extreme tail latency of search servers. The framework has a unique set of characteristics to predict long-running queries with high recall and improved precision. Specifically, prediction is delayed by a short duration to allow many short-running queries to complete without parallelization, and to allow the predictor to collect a set of dynamic features using runtime information. These features estimate query execution time with high accuracy. We also use them to estimate the prediction errors to override an uncertain prediction by selectively accelerating the query for a higher recall.We evaluate the proposed prediction framework to improve search engine performance in two scenarios using a simulation study: (1) query parallelization on a multicore processor, and (2) query scheduling on a heterogeneous processor. The results show that, for both scenarios, the proposed framework is effective in reducing the extreme tail latency compared to a start-of-the-art predictor because of its higher recall, and it improves server throughput by more than 70% because of its improved precision.
The analysis and findings reported here are from a self-report questionnaire survey of a sample of 1,035 high school students in Pusan, a metropolitan area of South Korea. Multiple regression and path analyses reveal that, for all types of drug behavior among these adolescents, the influence of parental variables was generally less than the influence of the peer variables. Even in South Korean society, where the stability and authority of the family is greater than in American society, peers have a greater influence than do parents on adolescents_ engaging in or refraining from deviant behavior. The findings conform more to the expectations of social learning theory than to those of social bonding theory, and generally replicate findings from research on adolescent drug use in the United States. Further research is clearly needed, but the findings here suggest that the social processes of substance use among adolescents and the theoretical explanations focusing on those processes are not confined to western societies.
Software developers increasingly rely on information from the Web, such as documents or code examples on application programming interfaces (APIs), to facilitate their development processes. However, API documents often do not include enough information for developers to fully understand how to use the APIs, and searching for good code examples requires considerable effort. To address this problem, we propose a novel code example recommendation system that combines the strength of browsing documents and searching for code examples and returns API documents embedded with high-quality code example summaries mined from the Web. Our evaluation results show that our approach provides code examples with high precision and boosts programmer productivity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.