Roee Ebenstein scite author profile

Roee Ebenstein

5Publications

1Citation Statement Received

66Citation Statements Given

How they've been cited

How they cite others

139

Affiliations

Google (United States), The Ohio State University

Publications

Order By: Most citations

Procella

et al. 2019

View full text Add to dashboard Cite

Large organizations like YouTube are dealing with exploding data volume and increasing demand for data driven applications. Broadly, these can be categorized as: reporting and dashboarding, embedded statistics in pages, time-series monitoring, and ad-hoc analysis. Typically, organizations build specialized infrastructure for each of these use cases. This, however, creates silos of data and processing, and results in a complex, expensive, and harder to maintain infrastructure. At YouTube, we solved this problem by building a new SQL query engine - Procella. Procella implements a superset of capabilities required to address all of the four use cases above, with high scale and performance, in a single product. Today, Procella serves hundreds of billions of queries per day across all four workloads at YouTube and several other Google product areas.

show abstract

DSDQuery DSI - Querying scientific data repositories with structured operators

Ebenstein

Agrawal

2015

View full text Add to dashboard Cite

Scientific data is often distributed through repositories that host a large number of files in formats such as NetCDF or HDF5. With recent and anticipated increases in the size of observational and simulation data, it is important to transport just the data that are of interest from a large distributed dataset. Unfortunately, existing portals provide limited querying interfaces -typically a set of predefined hard coded subsettings, limiting user's querying flexibility.This paper describes a system that addresses this gap. The relational algebra is adapted for scientific array querying allowing us to adapt a subset of SQL for this domain, which enables nuanced subsetting conditions to be applied on a set of dataset files within a repository. A query processing algorithm extracts and collects data from relevant datasets, based on metadata that was earlier extracted using an automatic metadata extraction engine. Finally, the system stitches a new structured, NetCDF, file to be returned as a resultset, allowing the returned data to be used and analyzed by existing tools. The system has been extensively evaluated to show its ability to handle increasing data and/or number of files.

show abstract

FluxQuery

Ebenstein

Kamat

Nandi

2016

View full text Add to dashboard Cite

Modern computing devices and user interfaces have necessitated highly interactive querying. Some of these interfaces issue a large number of dynamically changing and continuous queries to the backend. In others, users expect to inspect results during the query formulation process, in order to guide or help them towards specifying a full-fledged query. Thus, users end up issuing a fast-changing workload to the underlying database. In such situations, the user's query intent can be thought of as being in flux. In this paper, we show that the traditional query execution engines are not well-suited for this new class of highly interactive workloads. We propose a novel model to interpret the variability of likely queries in a workload. We implemented a cyclic scan-based approach to process queries from such workloads in an efficient and practical manner while reducing the overall system load. We evaluate and compare our methods with traditional systems and demonstrate the scalability of our approach, enabling thousands of queries to run simultaneously within interactive response times given low memory and CPU requirements.

show abstract

DistriPlan

Ebenstein

Agrawal

2017

View full text Add to dashboard Cite

FDQ: Advance Analytics Over Real Scientific Array Datasets

Ebenstein

Agrawal²,

Wang

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Roee Ebenstein

Procella

DSDQuery DSI - Querying scientific data repositories with structured operators

FluxQuery

DistriPlan

FDQ: Advance Analytics Over Real Scientific Array Datasets

Contact Info

Product

Resources

About