Improving Accuracy and Robustness of Self-Tuning Histograms by Subspace Clustering

Khachatryan, Andranik; Müller, Emmanuel; Stier, Christian; Böhm, Klemens

doi:10.1109/tkde.2015.2416725

Cited by 15 publications

(7 citation statements)

References 24 publications

(45 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Traditional CardEst methods, such as histogram [27] and sampling [13,16,18], are widely applied in DBMS and generally based on simplified assumptions and expert-designed heuristics. Many variants of histograms [1,3,6,8,9,15,23,25,29,30,33,34] are proposed later to enhance their performance. Sampling-based variants include query-driven kernel-based methods [13,16], index based methods [18] and random walk based methods [19,41].…”

Section: Related Workmentioning

confidence: 99%

Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size

Zhu¹,

Zeng²,

Pfadler³

et al. 2021

Preprint

View full text Add to dashboard Cite

Cardinality estimation (CardEst), a central component of the query optimizer, plays a significant role in generating high-quality query plans in DBMS. The CardEst problem has been extensively studied in the last several decades, using both traditional and ML-enhanced methods. Whereas, the hardest problem in CardEst, i.e., how to estimate the join query size on multiple tables, has not been extensively solved. Current methods either reply on independence assumptions or apply techniques with heavy burden, whose performance is still far from satisfactory. Even worse, existing CardEst methods are often designed to optimize one goal, i.e., inference speed or estimation accuracy, which can not adapt to different occasions.In this paper, we propose a very general framework, called G , to tackle with these challenges. Its key idea is to elegantly decouple the correlations across different tables and losslessly merge single table CardEst results to estimate the join query size. G supports obtaining the single table-wise CardEst results using any existing CardEst method and can process any complex join schema. Therefore, it easily adapts to different scenarios having different performance requirements, i.e., OLTP with fast estimation time or OLAP with high estimation accuracy. Meanwhile, we show that G can be seamlessly integrated into the plan search process and is able to support counting distinct number of values. All these properties exhibit the potential advances of deploying G in real-world DBMS.

show abstract

Section: Related Workmentioning

confidence: 99%

Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size

Zhu¹,

Zeng²,

Pfadler³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Inspired by [34], to fully mine the unique complementary information provided by different views, the co-regularization is introduced into the problem (6). This centroid-based approach enforces representations across different views towards a common centroid.…”

Section: A Problem Formulationmentioning

confidence: 99%

“…After the development in recent years, researchers put forward multi-seed space clustering algorithms. According to the different representation methods of subspace, the existing subspace clustering algorithms can be divided into four main types: statistical methods [2], iterative methods [3] [4], algebraic methods [5] [6] [7] and spectral-type methods [8] [9]. Statistical methods, such as mixtures of probabilistic principal component analyzers (MPPCA) [2], need to know in advance the number of subspace and dimensions, when the data and the noise distribution in space is not Gaussian distribution, the solution is not optimal.…”

Section: Introductionmentioning

confidence: 99%

Robust Multi-View Subspace Clustering Via Weighted Multi-Kernel Learning and Co-Regularization

Zheng

Zhang

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Using multi-kernel learning to deal with the non-linear relationship of data has become a new research topic in the field of multi-view subspace clustering. However, the existing methods have the following three defects: 1) the simple consensus kernel weighting strategy cannot give full play to the advantages of multiple kernels; 2) they are sensitive to non-Gaussian noise and their learning affinity matrices cannot meet the block diagonal properties required by clustering, resulting in low clustering performance; 3) the complementary feature information between the data of each view cannot be fully mined. In this paper, a novel robust multi-view subspace clustering method is proposed based on weighted multi-kernel learning and co-regularization (WMKMSC). Based on the self-expression learning framework, block diagonal regularizer (BDR), multi-kernel learning strategy and co-regularization are integrated into the proposed model. Especially, as a robust learning method, the mixture correntropy is used to construct a robust multi-kernel weighting strategy, which is helpful to learn the best consensus kernel. Our method is more effective and robust than several of the most advanced methods on five commonly used datasets.

show abstract

“…Kim et al [12] proposed a representation for subspace as an elastic-net, a new kind of the scheme that would imply the use of singular values of elastic-net regularization.Tang et al [13] used the search method of k Nearest Neighbors (k-NN) algorithms, being important regarding the implementation of machine learning and computer vision applications. Khachatryan et al [14] showed the significant improvement in the self-tuning technique by initializing the configuration. Further to enhance the robustness and accuracy factor in self-tuning the clusters of dense subspaces were proposed in data projections.…”

Section: Introductionmentioning

confidence: 99%

Framework for novel subspace clustering using search optimization methodology

Radhika¹,

N²,

J³

et al. 2018

IJET

View full text Add to dashboard Cite

Improving the yield as well as the perform of subspace clustering is one of the less-investigated topics in high-dimensional data. After reviewing existing approaches, it seriously felt that there is a need for classification of data points retrieved from a different number of subspace. The proposed study has presented a novel framework that targets to improve the accuracy of subspace clustering by addressing the problem associated with the exist of occlusion noise and dimensional complexity. An analytical approach as been proposed to design this framework with more emphasis on outlier minimization followed by obtaining optimal clusters. The technique also introduces a simple search optimization method, which is less iterative and is more productive for identifying the élite outcomes in each iterative step. The study outcome shows superior accuracy with a low rate of error when compared with the conventional approach.

show abstract

Improving Accuracy and Robustness of Self-Tuning Histograms by Subspace Clustering

Cited by 15 publications

References 24 publications

Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size

Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size

Robust Multi-View Subspace Clustering Via Weighted Multi-Kernel Learning and Co-Regularization

Framework for novel subspace clustering using search optimization methodology

Contact Info

Product

Resources

About