Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark

Moutafis, Panagiotis; Mavrommatis, George; Vassilakopoulos, Michael; Corral, Antonio

doi:10.3390/ijgi10110763

Cited by 4 publications

(3 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Voronoi diagram (VD) plays the role as a very effective tool in computing geometries. In recent years, VDs have been widely used in spatial databases to describe spatial neighbor relationships; they also are used for realization of a spatial neighbor query, spatial interpolation, and buffer analysis (Moutafis et al, 2021).…”

Section: Voronoi Diagrammentioning

confidence: 99%

A Novel Query Method for Spatial Database Based on Improved K-Nearest Neighbor Algorithm

Xia,

Xue

2023

International Journal of Decision Support System Technology

View full text Add to dashboard Cite

Spatial database is a spatial information database and is the core component of geographic information systems (GIS). Aiming at the problem that time complexity of k-nearest neighbor (kNN) querying algorithms are proportionate to scale of training samples, an efficient query method for spatial database based on the Spark framework and the reversed k-nearest neighbor (RkNN) is proposed. Firstly, based on the Spark framework, a two-layer indexing structure based on grid and Voronoi diagram is constructed, and an efficient filtering and a refining processing algorithm are proposed. Secondly, the filtering step of proposed algorithm is used to obtain the candidates, and the refining step is used to remove the candidates. Finally, the candidate sets from different regions are merged to get the final result. Results of experiments on real-world datasets validate that the proposed method has better query performance and better stability and significantly improves the processing speed.

show abstract

Section: Voronoi Diagrammentioning

confidence: 99%

A Novel Query Method for Spatial Database Based on Improved K-Nearest Neighbor Algorithm

Xia,

Xue

2023

International Journal of Decision Support System Technology

View full text Add to dashboard Cite

show abstract

“…Because Hadoop MapReduce is a disk-based distributed computing framework, its response time is relatively slow, so it is not suitable for online queries. Therefore, more and more researchers have proposed spatial query processing algorithms based on the Spark framework [ 4 , 5 , 6 , 7 , 24 , 25 , 26 , 27 , 28 ], which is an in-memory computing framework with a faster processing speed. Xie [ 6 ] proposed a spatial big data analysis system based on Spark, which expanded the Spark SQL engine and supported the construction of an RDD-based memory index, thus effectively supporting various query operations such as range query and kNN query.…”

Section: Related Workmentioning

confidence: 99%

“…Compared with the original Spark, SparkNN significantly improves the average query time. Moutafis [ 26 ] proposed the first distributed GKNN query algorithm in Apache Spark, and this method proved to be more efficient than Apache Hadoop.…”

Section: Related Workmentioning

confidence: 99%

A PID-Based kNN Query Processing Algorithm for Spatial Data

Qiao

Chen

et al. 2022

Sensors

View full text Add to dashboard Cite

As a popular spatial operation, the k-Nearest Neighbors (kNN) query is widely used in various spatial application systems. How to efficiently process a kNN query on spatial big data has always been an important research topic in the field of spatial data management. The centralized solutions are not suitable for spatial big data due to their poor scalability, while the existing distributed solutions are not efficient enough to meet the high real-time requirements of some spatial applications. Therefore, we introduce the Proportional Integral Derivative (PID) control technology into kNN query processing and propose a PID-based kNN query processing algorithm (PIDKNN) for spatial big data based on Spark. In this algorithm, the whole data space is divided into grid cells of the same size using the grid partition method, and the grid-based index is constructed. On this basis, the grid-based density peak clustering algorithm is used to cluster spatial data, and the corresponding PID parameters are set for each cluster. When performing kNN queries, the PID algorithm is used to estimate the radius growth step size of kNN queries, thereby realizing kNN query processing with a variable query radius growth step based on a feedback mechanism, which greatly improves the efficiency of kNN query processing. A series of experimental results show that the PIDKNN algorithm has good performance and scalability and is superior to the existing parallel kNN query processing methods.

show abstract

Defining and designing spatial queries: the role of spatial relationships

Carniel

2023

Geo-spatial Information Science

View full text Add to dashboard Cite

Efficient Group K Nearest-Neighbor Spatial Query Processing in Apache Spark

Cited by 4 publications

References 45 publications

A Novel Query Method for Spatial Database Based on Improved K-Nearest Neighbor Algorithm

A Novel Query Method for Spatial Database Based on Improved K-Nearest Neighbor Algorithm

A PID-Based kNN Query Processing Algorithm for Spatial Data

Defining and designing spatial queries: the role of spatial relationships

Contact Info

Product

Resources

About