Due the fast growth of new technology application like social media analysis, web data analysis and medical information network analysis, here the various types of data are processed frequently. The large amount of effective data management and analysis is very vital goal. To reduce the data processing complexity, time complexity, and space complexity in Big Data, the paper going to propose the k-nearest neighbor join (KNN) operation. KNN is used to find the K nearest points in S. It is a computational task that will handle the large range of applications such as knowledge discovery or data mining. When the volume and the dimension of data increases, then only distributed approaches can perform the big operations in a given time. Recent works have done on implementing the efficient solutions using the map reduce programming model because it is used for distributing the large scale data processing. Although these works provide different solutions for the same problem, each one has particular constraints and properties. This paper compares the existing of different computation of KNN on MapReduce. First the paper compares the solutions in to three steps for KNN computation on MapReduce: 1) Data processing, 2) Data partitioning and 3) Computation. The Experiment in this paper explains the variety of different data sets, and analyzes the data volume, data dimension and the value of k from many perspectives like time and space complexity, and accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.