Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, every day, millions of blog posts, micro-blog posts, social network status updates, etc. This rich stream of information can be used to identify, on an ongoing basis, emerging stories, and events that capture popular attention. Stories can be identified via groups of tightly-coupled realworld entities, namely the people, locations, products, etc., that are involved in the story. The sheer scale, and rapid evolution of the data involved necessitate highly efficient techniques for identifying important stories at every point of time.The main challenge in real-time story identification is the maintenance of dense subgraphs (corresponding to groups of tightlycoupled entities) under streaming edge weight updates (resulting from a stream of user-generated content). This is the first work to study the efficient maintenance of dense subgraphs under such streaming edge weight updates. For a wide range of definitions of density, we derive theoretical results regarding the magnitude of change that a single edge weight update can cause. Based on these, we propose a novel algorithm, DYNDENS, which outperforms adaptations of existing techniques to this setting, and yields meaningful results. Our approach is validated by a thorough experimental evaluation on large-scale real and synthetic datasets.
Queries asked on web search engines often target structured data, such as commercial products, movie showtimes, or airline schedules. However, surfacing relevant results from such data is a highly challenging problem, due to the unstructured language of the web queries, and the imposing scalability and speed requirements of web search. In this paper, we discover latent structured semantics in web queries and produce Structured Annotations for them. We consider an annotation as a mapping of a query to a table of structured data and attributes of this table. Given a collection of structured tables, we present a fast and scalable tagging mechanism for obtaining all possible annotations of a query over these tables. However, we observe that for a given query only few are sensible for the user needs. We thus propose a principled probabilistic scoring mechanism, using a generative model, for assessing the likelihood of a structured annotation, and we define a dynamic threshold for filtering out misinterpreted query annotations. Our techniques are completely unsupervised, obviating the need for costly manual labeling effort. We evaluated our techniques using real world queries and data and present promising experimental results.
The problem of skyline computation has attracted considerable research attention. In the categorical domain the problem becomes more complicated, primarily due to the partially-ordered nature of the attributes of tuples.In this paper, we initiate a study of streaming categorical skylines. We identify the limitations of existing work for offline categorical skyline computation and realize novel techniques for the problem of maintaining the skyline of categorical data in a streaming environment. In particular, we develop a lightweight data structure for indexing the tuples in the streaming buffer, that can gracefully adapt to tuples with many attributes and partially ordered domains of any size and complexity. Additionally, our study of the dominance relation in the dual space allows us to utilize geometric arrangements in order to index the categorical skyline and efficiently evaluate dominance queries. Lastly, a thorough experimental study evaluates the efficiency of the proposed techniques.
User generated content and social media (in the form of blogs, wikis, online video, microblogs, etc) are proliferating online. Grapevine conducts large scale data analysis on the social media collective, distilling and extracting information in real time. It aims to track entities and stories of interest in millions of blog posts, thousands of tweets, news items, etc., daily. Grapevine facilitates the interactive exploration of content, allowing users to discover interesting or surprising stories, optionally narrowed down on a specific demographic of interest (e.g. "What are Torontonians talking about on blogs?", "What are popular stories across news sources in Canada?", "What are financiers in Texas blogging about today?"). Stories of interest can be explored in a variety of ways, such as modifying their scope, obtaining related content (blog posts, news, etc), and examining their temporal evolution.
Abstract-In this paper, we introduce a family of expressive models for qualitative spatial reasoning with directions. The proposed family is based on the cognitive plausible cone-based model. We formally define the directional relations that can be expressed in each model of the family. Then, we use our formal framework to study two interesting problems: computing the inverse of a directional relation and composing two directional relations. For the composition operator, in particular, we concentrate on two commonly used definitions, namely, consistency-based and existential composition. Our formal framework allows us to prove that our solutions are correct. The presented solutions are handled in a uniform manner and apply to all of the models of the family.Index Terms-Spatial databases and GIS, cone-based directional relations, inverse and composition operators.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.