Apache Calcite

Begoli, Edmon; Camacho-Rodríguez, Jesús; Hyde, Julian; Mior, Michael J.; Lemire, Daniel

doi:10.1145/3183713.3190662

Cited by 86 publications

(15 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The relational algebra that underpins our processing within a database [34], has no equivalent yet in dataset search. Recently, Apache released information about the query processing system used for many of the Apache products including Hive and Storm, and [20] investigated how the relational algebra can be applied to data contained within the various data processing frameworks in the Apache suite. Alternatively, other recent work in query processing attempts to handle non-relational operators via adaptive query processing [76].…”

Section: Database Building Blocksmentioning

confidence: 99%

“…These limitations impact the use of the retrieved data -machine learning can be unduly affected by the processing that was performed over a dataset prior to its release [125], while knowing the original purpose for collecting the data aids interpretation and analysis [140]. In other words, in a dataset search context, approaches need to consider additional aspects such as data provenance [27,53,64,87,101,142], annotations [67,93,144], quality [116,131,148], granularity of content [81], and schema [9,20] to effectively evaluate a dataset's fitness for a particular use. The user does not have the ability to introspect over large amounts of data, and their attention must be prioritized [13].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dataset search: a survey

et al. 2019

View full text Add to dashboard Cite

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward.

show abstract

Section: Database Building Blocksmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Dataset search: a survey

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Apache Calcite [8] is a dynamic data management framework licensed by the Apache Foundation, supports the SQL language and its corresponding extensions. Calcite has complete query processing capabilities and can support a variety of common functions across different data management systems.…”

Section: Apache Calcitementioning

confidence: 99%

Research on SQL statement optimization rules for relational heterogeneous database

Zhang

Zhao

2023

Second International Symposium on Computer Applications and Information Systems (ISCAIS 2023)

View full text Add to dashboard Cite

In the single-type relational database scenario, using SQL statement optimization rules can effectively shorten the statement execution time. In the heterogeneous database scenario, the effectiveness of SQL statement optimization rules needs to be further explored. Connect the relational heterogeneous database through Calcite, and use predicate push down, constant transfer and sub-query de-nesting rules to optimize SQL statements. The experiment shows that the optimized SQL statement can effectively shorten the execution time in the scenario of relational heterogeneous database.

show abstract

“…We plan to add more heuristic query optimizations to Relational Playground. For example, Apache Calcite [2] contains more than 100 optimization rules. Although many of these are likely too complex for our setting, we expect that several of these rules will prove useful.…”

Section: Future Workmentioning

confidence: 99%

Relational Playground: Teaching the Duality of Relational Algebra and SQL

Mior

2023

Proceedings of the 2nd International Workshop on Data Systems Education: Bridging Education Practice With Education Research

Self Cite

View full text Add to dashboard Cite

Students in introductory data management courses are often taught how to write queries in SQL. This is a useful and practical skill, but it gives limited insight into how queries are processed by relational database engines. In contrast, relational algebra is a commonly used internal representation of queries by database engines, but can be challenging for students to grasp. We developed a tool we call Relational Playground for database students to explore the connection between relational algebra and SQL.

show abstract

Apache Calcite

Cited by 86 publications

References 28 publications

Dataset search: a survey

Dataset search: a survey

Research on SQL statement optimization rules for relational heterogeneous database

Relational Playground: Teaching the Duality of Relational Algebra and SQL

Contact Info

Product

Resources

About