Chris Jermaine scite author profile

This paper describes query processing in the DBO database system. Like other database systems designed for ad-hoc, analytic processing, DBO is able to compute the exact answer to queries over a large relational database in a scalable fashion. Unlike any other system designed for analytic processing, DBO can constantly maintain a guess as to the final answer to an aggregate query throughout execution, along with statistically meaningful bounds for the guess's accuracy. As DBO gathers more and more information, the guess gets more and more accurate, until it is 100% accurate as the query is completed. This allows users to stop the execution at any time that they are happy with the query accuracy, and encourages exploratory data analysis.

show abstract

Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Cormode

et al. 2011

View full text Add to dashboard Cite

Online aggregation for large MapReduce jobs

et al. 2011

View full text Add to dashboard Cite

In online aggregation, a database system processes a user's aggregation query in an online fashion. At all times during processing, the system gives the user an estimate of the final query result, with the confidence bounds that become tighter over time. In this paper, we consider how online aggregation can be built into a MapReduce system for large-scale data processing. Given the MapReduce paradigm's close relationship with cloud computing (in that one might expect a large fraction of MapReduce jobs to be run in the cloud), online aggregation is a very attractive technology. Since large-scale cloud computations are typically pay-as-you-go, a user can monitor the accuracy obtained in an online fashion, and then save money by killing the computation early once sufficient accuracy has been obtained.

show abstract

Turbo-charging estimate convergence in DBO

et al. 2009

View full text Add to dashboard Cite

DBO is a database system that utilizes randomized algorithms to give statistically meaningful estimates for the final answer to a multi-table, disk-based query from start to finish during query execution. However, DBO's "time 'til utility" (or "TTU"; that is, the time until DBO can give a useful estimate) can be overly large, particularly in the case that many database tables are joined in a query, or in the case that a join query includes a very selective predicate on one or more of the tables, or when the data are skewed. In this paper, we describe Turbo DBO, which is a prototype database system that can answer multi-table join queries in a scalable fashion, just like DBO. However, Turbo DBO often has a much lower TTU than DBO. The key innovation of Turbo DBO is that it makes use of novel algorithms that look for and remember "partial match" tuples in a randomized fashion. These are tuples that satisfy some of the boolean predicates associated with the query, and can possibly be grown into tuples that actually contribute to the final query result at a later time.

show abstract

Bayesian specification learning for finding API usage errors

Murali

Chaudhuri

Jermaine

2017

View full text Add to dashboard Cite

Bridging the Gap between Response Time and Energy-Efficiency in Broadcast Schedule Design

Yee

Navathe

Omiecinski

et al. 2002

View full text Add to dashboard Cite

Efficient data allocation over multiple channels at broadcast servers

Yee

Navathe

Omiecinski

et al. 2002

IEEE Trans. Comput.

108

View full text Add to dashboard Cite

Broadcast is a scalable way of disseminating data because broadcasting an item satisfies all outstanding client requests for it. However, because the transmission medium is shared, individual requests may have high response times. In this paper, we show how to minimize the average response time given multiple broadcast channels by optimally partitioning data among them. We also offer an approximation algorithm that is less complex than the optimal and show that its performance is near-optimal for a wide range of parameters. Finally, we briefly discuss the extensibility of our work with two simple, yet seldom researched extensions, namely, handling varying sized items and generating single channel schedules.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chris Jermaine

Conditional Anomaly Detection

Scalable approximate query processing with the DBO engine

Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Online aggregation for large MapReduce jobs

Turbo-charging estimate convergence in DBO

Bayesian specification learning for finding API usage errors

Bridging the Gap between Response Time and Energy-Efficiency in Broadcast Schedule Design

Efficient data allocation over multiple channels at broadcast servers

Contact Info

Product

Resources

About