2015
DOI: 10.1109/tkde.2015.2416725
|View full text |Cite
|
Sign up to set email alerts
|

Improving Accuracy and Robustness of Self-Tuning Histograms by Subspace Clustering

Abstract: In large databases, the amount and the complexity of the data calls for data summarization techniques. Such summaries are used to assist fast approximate query answering or query optimization. Histograms are a prominent class of model-free data summaries and are widely used in database systems. So-called self-tuning histograms look at query-execution results to refine themselves. An assumption with such histograms, which has not been questioned so far, is that they can learn the dataset from scratch, that is-s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 24 publications
(45 reference statements)
0
7
0
Order By: Relevance
“…Traditional CardEst methods, such as histogram [27] and sampling [13,16,18], are widely applied in DBMS and generally based on simplified assumptions and expert-designed heuristics. Many variants of histograms [1,3,6,8,9,15,23,25,29,30,33,34] are proposed later to enhance their performance. Sampling-based variants include query-driven kernel-based methods [13,16], index based methods [18] and random walk based methods [19,41].…”
Section: Related Workmentioning
confidence: 99%
“…Traditional CardEst methods, such as histogram [27] and sampling [13,16,18], are widely applied in DBMS and generally based on simplified assumptions and expert-designed heuristics. Many variants of histograms [1,3,6,8,9,15,23,25,29,30,33,34] are proposed later to enhance their performance. Sampling-based variants include query-driven kernel-based methods [13,16], index based methods [18] and random walk based methods [19,41].…”
Section: Related Workmentioning
confidence: 99%
“…Inspired by [34], to fully mine the unique complementary information provided by different views, the co-regularization is introduced into the problem (6). This centroid-based approach enforces representations across different views towards a common centroid.…”
Section: A Problem Formulationmentioning
confidence: 99%
“…After the development in recent years, researchers put forward multi-seed space clustering algorithms. According to the different representation methods of subspace, the existing subspace clustering algorithms can be divided into four main types: statistical methods [2], iterative methods [3] [4], algebraic methods [5] [6] [7] and spectral-type methods [8] [9]. Statistical methods, such as mixtures of probabilistic principal component analyzers (MPPCA) [2], need to know in advance the number of subspace and dimensions, when the data and the noise distribution in space is not Gaussian distribution, the solution is not optimal.…”
Section: Introductionmentioning
confidence: 99%
“…Kim et al [12] proposed a representation for subspace as an elastic-net, a new kind of the scheme that would imply the use of singular values of elastic-net regularization.Tang et al [13] used the search method of k Nearest Neighbors (k-NN) algorithms, being important regarding the implementation of machine learning and computer vision applications. Khachatryan et al [14] showed the significant improvement in the self-tuning technique by initializing the configuration. Further to enhance the robustness and accuracy factor in self-tuning the clusters of dense subspaces were proposed in data projections.…”
Section: Introductionmentioning
confidence: 99%