Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.
DOI: 10.1109/ssdm.2004.1311234
|View full text |Cite
|
Sign up to set email alerts
|

AutoPart: automating schema design for large scientific databases using data partitioning

Abstract: Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. To improve query execution performance, modern DBMS build indexes and materialized views on the wide tables that store experimental data. The replication of data in indexes and views, however, implies large amounts of additional storage space, and incurs high update costs as new experiments add or change large volumes of data. In this paper we explore automatic data pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
68
0

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 58 publications
(68 citation statements)
references
References 16 publications
(9 reference statements)
0
68
0
Order By: Relevance
“…Microsoft's AutoAdmin finds sets of candidate attributes for individual queries and then attempts to merge them based on the entire workload [5]. The AutoPart tool identifies conflicting access patterns on tables and creates read-only vertical partitions from disjoint column subsets that are similar to our secondary indexes [36]. Further heuristics can then be applied to prune this candidate set or combine attributes into multi-attribute sets [5].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Microsoft's AutoAdmin finds sets of candidate attributes for individual queries and then attempts to merge them based on the entire workload [5]. The AutoPart tool identifies conflicting access patterns on tables and creates read-only vertical partitions from disjoint column subsets that are similar to our secondary indexes [36]. Further heuristics can then be applied to prune this candidate set or combine attributes into multi-attribute sets [5].…”
Section: Related Workmentioning
confidence: 99%
“…Many of the existing techniques for automatic database partitioning, however, are tailored for large-scale analytical applications (i.e., data warehouses) [36,40]. These approaches are based on the notion of data declustering [28], where the goal is to spread data across nodes to maximize intra-query parallelism [5,10,39,49].…”
Section: Introductionmentioning
confidence: 99%
“…Then, in order to limit the search space they prune the set of candidates. Similar procedures are used in other works, such as AutoPart [15], which is focused on scientific workloads. In this case only vertical and categorical partitioning are considered.…”
Section: Effect Of Imbalance Factor and Data Correlationmentioning
confidence: 99%
“…BigTable [5] and PNUTS [7] use range-based partitioning on the keys; which still is too simple for our reference queries. In general, the complexity of scientific workloads makes it hard to design a good partitioning strategy manually, so automatic partitioning is preferred [15].…”
Section: Effect Of Imbalance Factor and Data Correlationmentioning
confidence: 99%
“…DBProxy [4] observed that most applications issue template-based queries and these queries have the same structure that contains different string or numeric constraints. AutoPart [14] deals with large scientific databases where the continuous insertions limit the application of indexes and materialized views. For optimization purposes, their algorithm horizontally and vertically partitions the tables in the original large database according to a representative workload using a single node.…”
Section: Automated Physical Design Solutionsmentioning
confidence: 99%