Abstract:Keywords graph queries, relational algebra, query optimization
IntroductionThe key components of Big Data are often defined as variety, velocity and volume [28] of data. Applications operating on continuously changing graphs are a prime example: the semi-structured graph-like nature introduces a high variety, changes happen at high velocity, and datasets are often high-volume. Such applications include fraud detection in financial transactions [27], validation of engineering models [3], and static analysis of … Show more
“…As future work, we plan to provide a formalisation based on graph-specific theoretical query frameworks, such as [12]. We will also give the formal specification of the operators for incremental query evaluation, which requires the definition of maintenance operations that keep the result in sync with the latest set of changes [22]. Our long-term research objective is to design an openCyphercompatible distributed, incremental graph query engine [20].…”
Graph database systems are increasingly adapted for storing and processing heterogeneous network-like datasets. However, due to the novelty of such systems, no standard data model or query language has yet emerged. Consequently, migrating datasets or applications even between related technologies often requires a large amount of manual work or ad-hoc solutions, thus subjecting the users to the possibility of vendor lock-in. To avoid this threat, vendors are working on supporting existing standard languages (e.g. SQL) or standardising languages. In this paper, we present a formal specification for openCypher, a highlevel declarative graph query language with an ongoing standardisation effort. We introduce relational graph algebra, which extends relational operators by adapting graph-specific operators and define a mapping from core openCypher constructs to this algebra. We propose an algorithm that allows systematic compilation of openCypher queries.
“…As future work, we plan to provide a formalisation based on graph-specific theoretical query frameworks, such as [12]. We will also give the formal specification of the operators for incremental query evaluation, which requires the definition of maintenance operations that keep the result in sync with the latest set of changes [22]. Our long-term research objective is to design an openCyphercompatible distributed, incremental graph query engine [20].…”
Graph database systems are increasingly adapted for storing and processing heterogeneous network-like datasets. However, due to the novelty of such systems, no standard data model or query language has yet emerged. Consequently, migrating datasets or applications even between related technologies often requires a large amount of manual work or ad-hoc solutions, thus subjecting the users to the possibility of vendor lock-in. To avoid this threat, vendors are working on supporting existing standard languages (e.g. SQL) or standardising languages. In this paper, we present a formal specification for openCypher, a highlevel declarative graph query language with an ongoing standardisation effort. We introduce relational graph algebra, which extends relational operators by adapting graph-specific operators and define a mapping from core openCypher constructs to this algebra. We propose an algorithm that allows systematic compilation of openCypher queries.
“…(4) Create an incremental view for the FRA expression. Incremental view maintenance algorithms for FRA are well studied both from a theoretical perspective [2,4,10,11] and implementation-wise, with many practical tools [12,33] and research prototypes [15,26,31]. While they are not expressible in rst-order logic, it is possible to evaluate transitive operations incrementally [3,23].…”
Graph processing challenges are common in modern database systems, with the property graph data model gaining widespread adoption [29]. Due to the novelty of the eld, graph databases and frameworks typically provide their own query language, such as Cypher for Neo4j [27], Gremlin for TinkerPop [28] and GraphScript for SAP HANA [24]. These languages often lack a formal background for their data model and semantics [1]. To address this, the openCypher initiative [21] aims to standardise a subset of the Cypher language, for which it currently provides grammar speci cation and a set of acceptance tests to allow vendors to implement their openCypher compatible engine.Incremental view maintenance has been used for decades in relational database systems [4]. In the graph domain, numerous use cases rely on complex queries and require low latency, including nancial fraud detection, source code analysis [32] and checking integrity (or well-formedness) constraints in databases [30]. While these could bene t from incremental evaluation, currently no property graph system provides incremental views. Our research investigates the incremental view maintenance for openCypher queries. A key challenge is that the property graph data model includes lists and maps, and queries can return arbitrarily nested data structures.We propose three desirable properties for an incremental property graph query engine: (IVM) incremental view maintenance, (FGN) ne granularity update operations on nested data structures, (ORD) ordering. Previous research showed that IVM and FGN is possible [19]. However, as stated in [8], "incremental view maintenance [IVM] strategies for data models that preserve order [ORD] remain an open problem to date". While removing support for ordering might seem a plausible workaround, it would pose serious limitations: (1) queries that require top-k results are common [17] and (2) even more importantly, Cypher handles paths as an alternating list of vertices and edges, which must be kept ordered.Therefore, we investigate the following research question: Which practical fragment of the openCypher language is incrementally maintainable?
“…In Section 4.3, we defined steps to translate queries to an FRA query plan to allow evaluation with existing relational IVM algorithms such as e.g. [21,32,33,48,61,64,65]. However, the rich set of operators required by PG queries necessitates the combination of multiple techniques.…”
The property graph data model of modern graph database systems is increasingly adapted for storing and processing heterogeneous datasets like networks. Many challenging applications with near real-time requirements -e.g. financial fraud detection, recommendation systems, and on-the-fly validation -can be captured with graph queries, which are evaluated repeatedly. To ensure quick response time for a changing data set, these applications would benefit from applying incremental view maintenance (IVM) techniques, which can perform continuous evaluation of queries and calculate the changes in the result set upon updates. However, currently, no graph databases provide support for incremental views. While IVM problems have been studied extensively over relational databases, views on property graph queries require operators outside the scope of standard relational algebra. Hence, tackling this problem requires the integration of numerous existing IVM techniques and possibly further extensions. In this paper, we present an approach to perform IVM on property graphs, using a nested relational algebraic representation for property graphs and graph operations. Then we define a chain of transformations to reduce most property graph queries to flat relational algebra and use techniques from discrimination networks (used in rule-based expert systems) to evaluate them. We demonstrate the approach using our prototype tool, ingraph, which uses openCypher, an open graph query language specified as part of an industry initiative. However, several aspects of our approach can be generalised to other graph query languages such as G-CORE and PGQL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.