MR-DIS: democratic instance selection for big data by MapReduce

Arnaiz‐González, Álvar; González-Rogel, Alejandro; Díez-Pastor, José-Francisco; Nozal, Carlos López

doi:10.1007/s13748-017-0117-5

Cited by 27 publications

(10 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, we can find approaches based on k‐NN for Big Data such as Peralta et al () where FS is performed on huge datasets using the k‐NN algorithm within an evolutionary approach, or a distributed Spark‐based version of the ReliefF algorithm Palma‐Mendoza, Rodriguez, and de‐Marcos (). In Arnaiz‐González, González‐Rogel, Díez‐Pastor, and López‐Nozal () a parallel implementation of the Democratic IS algorithm (DIS) is presented, called MR‐DIS. The idea of DIS algorithm is to apply a classic IS algorithm over a number of equally sized partitions of the training data.…”

Section: The K‐nn Algorithm As a Tool To Transform Big Data Into Smarmentioning

confidence: 99%

Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

Triguero

García-Gil

Maillo

et al. 2018

WIREs Data Min & Knowl

127

View full text Add to dashboard Cite

The k‐nearest neighbors algorithm is characterized as a simple yet effective data mining technique. The main drawback of this technique appears when massive amounts of data—likely to contain noise and imperfections—are involved, turning this algorithm into an imprecise and especially inefficient technique. These disadvantages have been subject of research for many years, and among others approaches, data preprocessing techniques such as instance reduction or missing values imputation have targeted these weaknesses. As a result, these issues have turned out as strengths and the k‐nearest neighbors rule has become a core algorithm to identify and correct imperfect data, removing noisy and redundant samples, or imputing missing values, transforming Big Data into Smart Data—which is data of sufficient quality to expect a good outcome from any data mining algorithm. The role of this smart data gleaning algorithm in a supervised learning context are investigated. This includes a brief overview of Smart Data, current and future trends for the k‐nearest neighbor algorithm in the Big Data context, and the existing data preprocessing techniques based on this algorithm. We present the emerging big data‐ready versions of these algorithms and develop some new methods to cope with Big Data. We carry out a thorough experimental analysis in a series of big datasets that provide guidelines as to how to use the k‐nearest neighbor algorithm to obtain Smart/Quality Data for a high‐quality data mining process. Moreover, multiple Spark Packages have been developed including all the Smart Data algorithms analyzed. This article is categorized under: Technologies > Data Preprocessing Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Classification

show abstract

Section: The K‐nn Algorithm As a Tool To Transform Big Data Into Smarmentioning

confidence: 99%

Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

Triguero

García-Gil

Maillo

et al. 2018

WIREs Data Min & Knowl

127

View full text Add to dashboard Cite

show abstract

“…In [16] discussed a Democratic Instance Selection (DIS) algorithm for parallel implementation. DIS algorithm achieved less computational complexity, linearity in the number of instances and intuitively parallelized internal configuration.…”

Section: Literature Reviewmentioning

confidence: 99%

Resource Allocation Based on Matchmaking Services in Multiple Clouds Using Trustworthy and Scalable Service Providers Algorithm

Solainayagi¹,

Ponnusamy²

2019

IJIES

View full text Add to dashboard Cite

Nowadays, cloud server adoption becomes more popular and highly demanded due to unlimited data contribution and retrieval from anywhere and anytime. Existing methods have lacks of adaptability issues with a trust value calculation based on multi-dimensional cloud service providers. Some, existing methods believed in expert opinion to evaluate the trust factors. However, the techniques have many adaptability issues and trust evaluation results have many errors. To bring the better solution, Trustworthy and Scalable Service Providers Algorithm is proposed for analyzing the design the relationship among the users, the broker, and cloud service providers. The proposed method works for resource allocation based on matchmaking service among multiple clouds. The trust-based proposed method efficiently minimizes the cloud user burden and enhances the system stabilities. Proposed method works based on information entropy theory to evaluate the multi-attribute based decision-making. Here, cloud user efficiently can find trustable cloud service providers in advance. Where, cloud service providers are more dependent on cloud users. Based on the excremental results, the proposed method reduces the System execution time 2 milliseconds; communication cost 9.33 % and improves the 39.33 % Trust score compare than existing methodologies in multiple cloud environments.

show abstract

“…The observed fact about big data processing is the increased computational complexity because of the high volume [21]. The analysis of big data in supervised classification is based on the learning algorithms, and after that, it finds the appropriate classes for the datasets [22].…”

Section: Introductionmentioning

confidence: 99%

Analysis of Bayesian optimization algorithms for big data classification based on Map Reduce framework

Banchhor

Srinivasu

2021

J Big Data

View full text Add to dashboard Cite

The process of big data handling refers to the efficient management of storage and processing of a very large volume of data. The data in a structured and unstructured format require a specific approach for overall handling. The classifiers analyzed in this paper are correlative naïve Bayes classifier (CNB), Cuckoo Grey wolf CNB (CGCNB), Fuzzy CNB (FCNB), and Holoentropy CNB (HCNB). These classifiers are based on the Bayesian principle and work accordingly. The CNB is developed by extending the standard naïve Bayes classifier with applied correlation among the attributes to become a dependent hypothesis. The cuckoo search and grey wolf optimization algorithms are integrated with the CNB classifier, and significant performance improvement is achieved. The resulting classifier is called a cuckoo grey wolf correlative naïve Bayes classifier (CGCNB). Also, the performance of the FCNB and HCNB classifiers are analyzed with CNB and CGCNB by considering accuracy, sensitivity, specificity, memory, and execution time.

show abstract

MR-DIS: democratic instance selection for big data by MapReduce

Cited by 27 publications

References 22 publications

Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

Resource Allocation Based on Matchmaking Services in Multiple Clouds Using Trustworthy and Scalable Service Providers Algorithm

Analysis of Bayesian optimization algorithms for big data classification based on Map Reduce framework

Contact Info

Product

Resources

About