The LogicBlox system aims to reduce the complexity of software development for modern applications which enhance and automate decision-making and enable their users to evolve their capabilities via a "self-service" model. Our perspective in this area is informed by over twenty years of experience building dozens of missioncritical enterprise applications that are in use by hundreds of large enterprises across industries such as retail, telecommunications, banking, and government. We designed and built LogicBlox to be the system we wished we had when developing those applications.In this paper, we discuss the design considerations behind the LogicBlox system and give an overview of its implementation, highlighting innovative aspects. These include: LogiQL, a unified and declarative language based on Datalog; the use of purely functional data structures; novel join processing strategies; advanced incremental maintenance and live programming facilities; a novel concurrency control scheme; and built-in support for prescriptive and predictive analytics.
Various approaches for keyword proximity search have been implemented in relational databases, XML and the Web. Yet, in all of them, an answer is a Q-fragment, namely, a subtree T of the given data graph G, such that T contains all the keywords of the query Q and has no proper subtree with this property. The rank of an answer is inversely proportional to its weight. Three problems are of interest: finding an optimal (i.e., top-ranked) answer, computing the top-k answers and enumerating all the answers in ranked order. It is shown that, under data complexity, an efficient algorithm for solving the first problem is sufficient for solving the other two problems with polynomial delay. Similarly, an efficient algorithm for finding a θ-approximation of the optimal answer suffices for carrying out the following two tasks with polynomial delay, under query-and-data complexity. First, enumerating in a (θ + 1)-approximate order. Second, computing a (θ + 1)-approximation of the top-k answers. As a corollary, this paper gives the first efficient algorithms, under data complexity, for enumerating all the answers in ranked order and for computing the top-k answers. It also gives the first efficient algorithms, under query-and-data complexity, for enumerating in a provably approximate order and for computing an approximation of the top-k answers.
In keyword search over data graphs, an answer is a nonredundant subtree that includes the given keywords. An algorithm for enumerating answers is presented within an architecture that has two main components: an engine that generates a set of candidate answers and a ranker that evaluates their score. To be effective, the engine must have three fundamental properties. It should not miss relevant answers, has to be efficient and must generate the answers in an order that is highly correlated with the desired ranking. It is shown that none of the existing systems has implemented an engine that has all of these properties. In contrast, this paper presents an engine that generates all the answers with provable guarantees. Experiments show that the engine performs well in practice. It is also shown how to adapt this engine to queries under the OR semantics. In addition, this paper presents a novel approach for implementing rankers destined for eliminating redundancy. Essentially, an answer is ranked according to its individual properties (relevancy) and its intersection with the answers that have already been presented to the user. Within this approach, experiments with specific rankers are described.
Various known models of probabilistic XML can be represented as instantiations of abstract p-documents. Such documents have, in addition to ordinary nodes, distributional nodes that specify the probabilistic process of generating a random document. Within this abstraction, families of pdocuments, which are natural extensions and combinations of previous models, are considered. The focus is on efficiency of applying twig queries (with projection) to p-documents. A closely related issue is the ability to (efficiently) translate a given document of one family into another family. Furthermore, both of these tasks have two variants that correspond to the value-based and object-based semantics.The translation relationships among different families of p-documents are studied. An efficient algorithm for evaluating twig queries over one specific family is given. This algorithm generalizes a known algorithm and significantly improves its running time, both analytically and experimentally. It is shown that this family is the maximal, among the ones considered, for which query evaluation is tractable. For the rest, efficient approximate algorithms for query evaluation are presented.
Various known models of probabilistic XML can be represented as instantiations of the abstract notion of p-documents. In addition to ordinary nodes, p-documents have distributional nodes that specify the possible worlds and their probabilistic distribution. Particular families of p-documents are determined by the types of distributional nodes that can be used as well as by the structural constraints on the placement of those nodes in a p-document. Some of the resulting families provide natural extensions and combinations of previously studied probabilistic XML models. The focus of the paper is on the expressive power of families of p-documents. In particular, two main issues are studied. Some of the results described in this paper were reported in [1,2].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.