Atreyee Dey scite author profile

Atreyee Dey

4Publications

37Citation Statements Received

27Citation Statements Given

How they've been cited

How they cite others

Affiliations

IBM Research - India, Indian Institute of Science Bangalore

Publications

Order By: Most citations

Efficiently approximating query optimizer plan diagrams

Dey

Bhaumik

et al. 2008

Proc. VLDB Endow.

View full text Add to dashboard Cite

Given a parametrized n-dimensional SQL query template and a choice of query optimizer, a plan diagram is a color-coded pictorial enumeration of the execution plan choices of the optimizer over the query parameter space. These diagrams have proved to be a powerful metaphor for the analysis and redesign of modern optimizers, and are gaining currency in diverse industrial and academic institutions. However, their utility is adversely impacted by the impractically large computational overheads incurred when standard bruteforce exhaustive approaches are used for producing fine-grained diagrams on high-dimensional query templates. In this paper, we investigate strategies for efficiently producing close approximations to complex plan diagrams. Our techniques are customized to the features available in the optimizer's API, ranging from the generic optimizers that provide only the optimal plan for a query, to those that also support costing of sub-optimal plans and enumerating rank-ordered lists of plans. The techniques collectively feature both random and grid sampling, as well as inference techniques based on nearest-neighbor classifiers, parametric query optimization and plan cost monotonicity. Extensive experimentation with a representative set of TPC-H and TPC-DS-based query templates on industrial-strength optimizers indicates that our techniques are capable of delivering 90% accurate diagrams while incurring less than 15% of the computational overheads of the exhaustive approach. In fact, for full-featured optimizers, we can guarantee zero error with less than 10% overheads. These approximation techniques have been implemented in the publicly available Picasso optimizer visualization tool.

show abstract

On the stability of plan costs and the costs of plan stability

et al. 2010

View full text Add to dashboard Cite

Predicate selectivity estimates are subject to considerable run-time variation relative to their compile-time estimates, often leading to poor plan choices that cause inflated response times. We present here a parametrized family of plan generation and selection algorithms that replace, whenever feasible, the optimizer's solely costconscious choice with an alternative plan that is (a) guaranteed to be near-optimal in the absence of selectivity estimation errors, and (b) likely to deliver comparatively stable performance in the presence of arbitrary errors. These algorithms have been implemented within the PostgreSQL optimizer, and their performance evaluated on a rich spectrum of TPC-H and TPC-DS-based query templates in a variety of database environments. Our experimental results indicate that it is indeed possible to identify robust plan choices that substantially curtail the adverse effects of erroneous selectivity estimates. In fact, the plan selection quality provided by our algorithms is often competitive with those obtained through apriori knowledge of the plan search and optimality spaces. The additional computational overheads incurred by the replacement approach are miniscule in comparison to the expected savings in query execution times. We also demonstrate that with appropriate parameter choices, it is feasible to directly produce anorexic plan diagrams, a potent objective in query optimizer design.

show abstract

Exploiting evidence from unstructured data to enhance master data management

et al. 2012

View full text Add to dashboard Cite

Master data management (MDM) integrates data from multiple structured data sources and builds a consolidated 360-degree view of business entities such as customers and products. Today's MDM systems are not prepared to integrate information from unstructured data sources, such as news reports, emails, call-center transcripts, and chat logs. However, those unstructured data sources may contain valuable information about the same entities known to MDM from the structured data sources. Integrating information from unstructured data into MDM is challenging as textual references to existing MDM entities are often incomplete and imprecise and the additional entity information extracted from text should not impact the trustworthiness of MDM data.In this paper, we present an architecture for making MDM text-aware and showcase its implementation as IBM InfoSphere MDM Extension for Unstructured Text Correlation, an add-on to IBM InfoSphere Master Data Management Standard Edition. We highlight how MDM benefits from additional evidence found in documents when doing entity resolution and relationship discovery. We experimentally demonstrate the feasibility of integrating information from unstructured data sources into MDM.

show abstract

Fast Mining of Interesting Phrases from Subsets of Text Corpora

Deepak

Dey

Majumdar

2014

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Atreyee Dey

Efficiently approximating query optimizer plan diagrams

On the stability of plan costs and the costs of plan stability

Exploiting evidence from unstructured data to enhance master data management

Fast Mining of Interesting Phrases from Subsets of Text Corpora

Contact Info

Product

Resources

About