Spark SQL

Armbrust, Michael; Xin, Reynold; Cheng, Lei; Huai, Yin; Liu, Davies; Bradley, Joseph K.; Meng, Xiangrui; Kaftan, Tomer; Franklin, Michael J.; Ghodsi, Ali; Zaharia, Matei

doi:10.1145/2723372.2742797

Cited by 936 publications

(120 citation statements)

References 27 publications

Supporting

Mentioning

117

Contrasting

Unclassified

Order By: Relevance

“…We therefore prepared two queries, named q 7 and q 8 in Spark SQL [2] using Left-Outer-Join, that query the same results as q 7 and q 8 respectively, in order to 1) validate the correctness of our parallel-efficient queries generation, 2) compare the performance of our solution to an industrial solution also under Spark implementation. Our solution is slower than Spark SQL for simple queries, e.g.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Let High-level Graph Queries Be Parallel Efficient: An Approach Over Structural Recursion On Pregel

Tung

Meng

et al. 2016

Journal of Information Processing

View full text Add to dashboard Cite

Graphs play an important role today in managing big data. Supporting declarative graph queries is one of the most crucial parts for efficiently manipulating graph databases. Structural recursion has been studied for graph querying and graph transformations. However, most of the previous studies about graph structural recursion do not exploit in practical the power of parallel computing. The bulk semantics, which is used for parallel evaluation of structural recursion, still impose many constraints that limit the performance of querying in parallel. In this paper, we propose a framework that systematically generates structural recursive functions from high-level declarative graph queries, then the generated functions are evaluated efficiently on our framework on top of the Pregel model. Therefore, the complexity in developing efficient structural recursive functions is relaxed by our solution.

show abstract

Section: Methodsmentioning

confidence: 99%

“…While N(2007,{}) satisfies conditions in line 2, line 5 and line 6, we thus apply line 7 and obtain: f 2 ({2007 : $gv 2 }) = if isempty($gv 2 ) then{OK : {}} else {} f 2 ({$l : $gv 2 }) = {} where entry point is f 2 .…”

mentioning

confidence: 99%

Let High-level Graph Queries Be Parallel Efficient: An Approach Over Structural Recursion On Pregel

Tung

Meng

et al. 2016

Journal of Information Processing

View full text Add to dashboard Cite

show abstract

“…Compilers usually convert input programs, given as text strings, into an Intermediate Representation (IR) which contains all essential information available about the program after parsing 6 . Optimizing compilers use IRs to facilitate the definition and application of optimizations.…”

Section: Intermediate Representationmentioning

confidence: 99%

“…Recently, query compilation has returned to the limelight, with commercial systems such as StreamBase, IBM Spade, Microsoft's Hekaton, Cloudera Impala, and MemSQL employing it. Academic research has also intensified [33,2,52,56,64,53,54,55,50,84,19,62,44,6].…”

Section: Introductionmentioning

confidence: 99%

How to Architect a Query Compiler

Shaikhha

Klonatos

Parreaux

et al. 2016

Proceedings of the 2016 International Conference on Management of Data

View full text Add to dashboard Cite

This paper studies architecting query compilers. The state of the art in query compiler construction is lagging behind that in the compilers field. We attempt to remedy this by exploring the key causes of technical challenges in need of well founded solutions, and by gathering the most relevant ideas and approaches from the PL and compilers communities for easy digestion by database researchers. All query compilers known to us are more or less monolithic template expanders that do the bulk of the compilation task in one large leap. Such systems are hard to build and maintain. We propose to use a stack of multiple DSLs on different levels of abstraction with lowering in multiple steps to make query compilers easier to build and extend, ultimately allowing us to create more convincing and sustainable compiler-based data management systems. We attempt to derive our advice for creating such DSL stacks from widely acceptable principles. We have also re-created a well-known query compiler following these ideas and report on this effort.

show abstract

“…The operators over relational data provide a simple object-relational mapping that makes it easy to specify wrappers to the underlying RDBMS. More recently, in the context of the cloud, Spark SQL [1] has been proposed as an Apache Spark module to provide tight integration between relational and procedural processing through a declarative API that integrates relational operators with procedural Spark code, taking advantage of massive parallelism. Similarly to LINQ, Spark SQL can map to relations arbitrary Java objects as well as different data sources.…”

Section: Introductionmentioning

confidence: 99%

CloudMdsQL: querying heterogeneous cloud data stores with a common language

Kolev

Valduriez

Bondiombouy

et al. 2015

Distrib Parallel Databases

View full text Add to dashboard Cite

The blooming of different cloud data management infrastructures, specialized for different kinds of data and tasks, has led to a wide diversification of DBMS interfaces and the loss of a common programming paradigm. In this paper, we present the design of a cloud multidatastore query language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store's native query interface. The query engine has a fully distributed architecture, which provides important opportunities for optimization. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized, e.g. by pushing down select predicates, using bind join, performing join ordering, or planning intermediate data shipping. Our experimental validation, with three data stores (graph, document and relational) and representative queries, shows that CloudMdsQL satisfies the five important requirements for a cloud multidatastore query language.B Boyan Kolev

show abstract

Spark SQL

Cited by 936 publications

References 27 publications

Let High-level Graph Queries Be Parallel Efficient: An Approach Over Structural Recursion On Pregel

Let High-level Graph Queries Be Parallel Efficient: An Approach Over Structural Recursion On Pregel

How to Architect a Query Compiler

CloudMdsQL: querying heterogeneous cloud data stores with a common language

Contact Info

Product

Resources

About