Data Clustering is an interesting field of unsupervised learning that has been extensively used and discussed over several research papers and scientific studies. It handles several issues related to data analysis by grouping similar entities into the same set. Up to now, many algorithms were developed for clustering using several techniques including centroids, density and dendrograms approaches. We count nowadays more than 100 diverse algorithms and many enhancements for each algorithm. Therefore, data scientists still struggle to find the best clustering method to use among this diversity of techniques. In this paper we present a survey on DBSCAN algorithm and its enhancements with respect to time requirement. A significant comparison of DBSCAN versions is also illustrated in this paper to help data scientist make decisions about the best version of DBSCAN to use.
A Hadoop HDFS is an organized and distributed collection of files. It is created to store a huge part of data and then retrieve it and analyze it efficiently in a less amount of time. To retrieve and analyze data from the Hadoop HDFS, MapReduce Jobs must be created directly using some programming languages like Java or indirectly using some high level languages like HiveQL and PigLatin. Everyone knows that creating MapReduce programs using programming languages is a difficult task that requires a remarkable effort for their creation and also for their maintenance. Writing MapReduce code by hand needs a lot of time, introduce bugs, harm readability, and impede optimizations. Profiles working in the field of big data always try to avoid hard and long programs in their work. They are always looking for much simpler alternatives like graphical interfaces or reduced scripts like PIG Latin or even SQL queries. This article proposes to use a MapReduce Query API inspired from Hibernate Criteria to simplify the code of MapReduce programs. This API proposes a set of predefined methods for making restrictions, projections, logical conditions and so on. An implementation of the Word Count example using the Query Criteria API is illustrated in this paper.
Regular expressions are heavily used in the field of computer programming. They are known by their strength to search or replace parts of strings according to a given structure (mails, phone, numbers, etc.). Currently regular expressions are only used to search for some patterns or to make some substitutions in strings. However, the need may be wider than that when it comes to order the results of a regular expression or to group them according to some criteria. Developers are always called to analyze the results of a regular expression by doing some restrictions such as (equal, not equal, between) or some projections like (maximum, average, grouping by ..) or sorts. Unfortunately, to do these treatments, the developer must implement his own algorithms which cost him a remarkable effort and a waste of time. We propose in this paper an API called RegexCriteria inspired from Hibernate Criteria to support developer while analysing the results of a regular expression.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.