Neo

Marcus, Ryan; Negi, Parimarjan; Mao, Hongzi; Zhang, Chi; Alizadeh, Mohammad; Kraska, Tim; Papaemmanouil, Olga; Tatbul, Nesime

doi:10.14778/3342263.3342644

Cited by 178 publications

(43 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 3 shows the reward and memory usage behavior of each variant (omitting "Q", which performed too poorly to be graphed on the same axes as the other variants) and the CPython GC. While many reinforcement learning techniques can require hours or days to train [20,22,29], each optimized learned GC variant is able to learn a competitive policy quickly, often within seconds. The second row of Figure 3 shows memory usage over time, tracked via the CPython heap.…”

Section: Methodsmentioning

confidence: 99%

“…While these techniques can be helpful for applications where trace information is available, both works (1) require post-hoc analysis of program traces, and do not automatically adapt their policies in real time, and (2) minimize time spent in GC mechanisms, as opposed to optimizing a user-defined reward function (e.g., request latency). Many previous works have applied reinforcement learning to various systems problems, including query optimization [22,28], cluster scheduling [20], stream processing [29], software debloating [8], and cloud provisioning [21,27].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Learned garbage collection

Cen

Marcus

Mao

et al. 2020

Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages

Self Cite

View full text Add to dashboard Cite

Several programming languages use garbage collectors (GCs) to automatically manage memory for the programmer. Such collectors must decide when to look for unreachable objects to free, which can have a large performance impact on some applications. In this preliminary work, we propose a design for a learned garbage collector that autonomously learns over time when to perform collections. By using reinforcement learning, our design can incorporate user-defined reward functions, allowing an autonomous garbage collector to learn to optimize the exact metric the user desires (e.g., request latency or queries per second). We conduct an initial experimental study on a prototype, demonstrating that an approach based on tabular Q learning may be promising.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Learned garbage collection

Cen

Marcus

Mao

et al. 2020

Proceedings of the 4th ACM SIGPLAN International Workshop on Machine Learning and Programming Languages

Self Cite

View full text Add to dashboard Cite

show abstract

“…Further speedups may result from (1) implementing our tree library in a native language rather than in Python, and (2) switching Woodblock's learning algorithm to a distributed learner [14]. (42) p_container (34) sr_name (29) cn_name (20) p_brand (14) l_receiptdate (8) l_shipdate (5) AC 0 (2) cr_name (1) p_size (1) p_type (1) AC 2 (1) Figure 9: A Woodblock-produced top-performing qd-tree for TPC-H. The number after each legend indicates the total number of cuts on that column (or advanced cut).…”

Section: Time To Produce Layoutsmentioning

confidence: 99%

Qd-tree: Learning Data Layouts for Big Data Analytics

Yang

Chandramouli

Wang

et al. 2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

Corporations today collect data at an unprecedented and accelerating scale, making the need to run queries on large datasets increasingly important. Technologies such as columnar block-based data organization and compression have become standard practice in most commercial database systems. However, the problem of best assigning records to data blocks on storage is still open. For example, today's systems usually partition data by arrival time into row groups, or range/hash partition the data based on selected fields. For a given workload, however, such techniques are unable to optimize for the important metric of the number of blocks accessed by a query. This metric directly relates to the I/O cost, and therefore performance, of most analytical queries. Further, they are unable to exploit additional available storage to drive this metric down further. In this paper, we propose a new framework called a querydata routing tree, or qd-tree, to address this problem, and propose two algorithms for their construction based on greedy and deep reinforcement learning techniques. Experiments over benchmark and real workloads show that a qd-tree can provide physical speedups of more than an order of magnitude compared to current blocking schemes, and can reach within 2× of the lower bound for data skipping based on selectivity, while providing complete semantic descriptions of created blocks.

show abstract

“…There has been a host of work on query optimization in DBMS, including access path selection [46] , join optimization [47,48] and recently, machine leaning methods [49][50][51] . These focus on access path cost models for, e.g., main-memory concurrent systems [46] , heuristics for join [47] and group-by [48] re-ordering, learned indices [50,51] or optimizers [49,[52][53][54] . Our algorithms and techniques are complementary to the prior work, to incorporate bounded evaluation into DBMS query optimization.…”

Section: R(|x → Y N |)mentioning

confidence: 99%

Bounded Evaluation: Querying Big Data with Bounded Resources

CaoYang

Fan

Yuan

2020

Int. J. Autom. Comput.

View full text Add to dashboard Cite

This work aims to reduce queries on big data to computations on small data, and hence make querying big data possible under bounded resources. A query is boundedly evaluable when posed on any big dataset , there exists a fraction of such that , and the cost of identifying is independent of the size of. It has been shown that with an auxiliary structure known as access schema, many queries in relational algebra (RA) are boundedly evaluable under the set semantics of RA. This paper extends the theory of bounded evaluation to RA aggr , i.e., RA extended with aggregation, under the bag semantics. (1) We extend access schema to bag access schema, to help us identify for RA aggr queries. (2) While it is undecidable to determine whether an RA aggr query is boundedly evaluable under a bag access schema, we identify special cases that are decidable and practical. (3) In addition, we develop an effective syntax for bounded RA aggr queries, i.e., a core subclass of boundedly evaluable RA aggr queries without sacrificing their expressive power. (4) Based on the effective syntax, we provide efficient algorithms to check the bounded evaluability of RA aggr queries and to generate query plans for bounded RA aggr queries. (5) As proof of concept, we extend PostgreSQL to support bounded evaluation. We experimentally verify that the extended system improves performance by orders of magnitude.

show abstract

Neo

Cited by 178 publications

References 30 publications

Learned garbage collection

Learned garbage collection

Qd-tree: Learning Data Layouts for Big Data Analytics

Bounded Evaluation: Querying Big Data with Bounded Resources

Contact Info

Product

Resources

About