Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data 2015
DOI: 10.1145/2723372.2742797
|View full text |Cite
|
Sign up to set email alerts
|

Spark SQL

Abstract: Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API. Built on our experience with Shark, Spark SQL lets Spark programmers leverage the benefits of relational processing (e.g., declarative queries and optimized storage), and lets SQL users call complex analytics libraries in Spark (e.g., machine learning). Compared to previous systems, Spark SQL makes two main additions. First, it offers much tighter integration between relational and procedura… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
117
0
3

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 936 publications
(120 citation statements)
references
References 27 publications
0
117
0
3
Order By: Relevance
“…We therefore prepared two queries, named q 7 and q 8 in Spark SQL [2] using Left-Outer-Join, that query the same results as q 7 and q 8 respectively, in order to 1) validate the correctness of our parallel-efficient queries generation, 2) compare the performance of our solution to an industrial solution also under Spark implementation. Our solution is slower than Spark SQL for simple queries, e.g.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We therefore prepared two queries, named q 7 and q 8 in Spark SQL [2] using Left-Outer-Join, that query the same results as q 7 and q 8 respectively, in order to 1) validate the correctness of our parallel-efficient queries generation, 2) compare the performance of our solution to an industrial solution also under Spark implementation. Our solution is slower than Spark SQL for simple queries, e.g.…”
Section: Methodsmentioning
confidence: 99%
“…While N(2007,{}) satisfies conditions in line 2, line 5 and line 6, we thus apply line 7 and obtain: f 2 ({2007 : $gv 2 }) = if isempty($gv 2 ) then{OK : {}} else {} f 2 ({$l : $gv 2 }) = {} where entry point is f 2 .…”
mentioning
confidence: 99%
“…Compilers usually convert input programs, given as text strings, into an Intermediate Representation (IR) which contains all essential information available about the program after parsing 6 . Optimizing compilers use IRs to facilitate the definition and application of optimizations.…”
Section: Intermediate Representationmentioning
confidence: 99%
“…Recently, query compilation has returned to the limelight, with commercial systems such as StreamBase, IBM Spade, Microsoft's Hekaton, Cloudera Impala, and MemSQL employing it. Academic research has also intensified [33,2,52,56,64,53,54,55,50,84,19,62,44,6].…”
Section: Introductionmentioning
confidence: 99%
“…The operators over relational data provide a simple object-relational mapping that makes it easy to specify wrappers to the underlying RDBMS. More recently, in the context of the cloud, Spark SQL [1] has been proposed as an Apache Spark module to provide tight integration between relational and procedural processing through a declarative API that integrates relational operators with procedural Spark code, taking advantage of massive parallelism. Similarly to LINQ, Spark SQL can map to relations arbitrary Java objects as well as different data sources.…”
Section: Introductionmentioning
confidence: 99%