SummaryWith the increasing complexity of graph queries, query cost estimation has become a key challenge in graph databases. Accurate estimation results are critical for database administrators or database management systems to perform query processing or optimization tasks. An efficient and accurate estimation model can improve the estimation quality and make the produced results credible. Although learning‐based methods have been applied in query cost estimation, most of them are directed at relational queries and cannot be directly used for graph queries. Furthermore, most estimation approaches focus on the correlations between predicates or columns. The dependencies between query schema and query filter conditions and the correlation between query schema are ignored. In this study, we construct a novel deep learning model composed of reasoning and retrieval processes that can accurately capture the potential logical relationships in graph queries. This solves the above problems to some extent. In addition, we propose a query estimation framework that divides the estimation task into query workload generation, training data collection, feature extraction and encoding, and estimation model construction. The results of the experiment on real‐world datasets show that our estimation model can improve the estimation quality and outperforms other compared deep learning models in terms of estimation accuracy.
Clustering is widely used as an unsupervised learning algorithm. However, it is often necessary to manually enter the number of clusters, and the number of clusters has a great impact on the clustering effect. At present, researchers propose some algorithms to determine the number of clusters, but the results are not very good for determining the number of clusters of data sets with complex and scattered shapes. To solve these problems, this paper proposes using the Gaussian Kernel density estimation function to determine the maximum number of clusters, use the change of center point score to get the candidate set of center points, and further use the change of the minimum distance between center points to get the number of clusters. The experiment shows the validity and practicability of the proposed algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.