An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark

Hasan, Raed Abdullah; Alhayali, Royida A. Ibrahem; Zaki, Nashwan Dheyaa; Ali, Ahmed H.

doi:10.12928/telkomnika.v17i6.11711

Cited by 26 publications

(22 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It involves the use of FS algorithms to filter out irrelevant and redundant data features from the original dataset to prevent over-fitting [6,13] and improve the classification accuracy of the model. Feature selection also reduces the classification models' complexity in time and space domains [14][15][16][17][18]. The main idea of this paper is to employ the TLBO-based algorithm for features subset selection in BC diagnosis.…”

Section: Telkomnika Telecommun Comput El Controlmentioning

confidence: 99%

A new model for large dataset dimensionality reduction based on teaching learning-based optimization and logistic regression

et al. 2020

View full text Add to dashboard Cite

One of the human diseases with a high rate of mortality each year is breast cancer (BC). Among all the forms of cancer, BC is the commonest cause of death among women globally. Some of the effective ways of data classification are data mining and classification methods. These methods are particularly efficient in the medical field due to the presence of irrelevant and redundant attributes in medical datasets. Such redundant attributes are not needed to obtain an accurate estimation of disease diagnosis. Teaching learning-based optimization (TLBO) is a new metaheuristic that has been successfully applied to several intractable optimization problems in recent years. This paper presents the use of a multi-objective TLBO algorithm for the selection of feature subsets in automatic BC diagnosis. For the classification task in this work, the logistic regression (LR) method was deployed. From the results, the projected method produced better BC dataset classification accuracy (classified into malignant and benign). This result showed that the projected TLBO is an efficient features optimization technique for sustaining data-based decision-making systems.

show abstract

Section: Telkomnika Telecommun Comput El Controlmentioning

confidence: 99%

A new model for large dataset dimensionality reduction based on teaching learning-based optimization and logistic regression

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Hence, a hybrid adaptive approach called Hoeffding Naive Bayes Tree (hnbt) which performs better than the component prediction methods for both complex and simple concepts has been proposed. This concept of this method based on executing a naive Bayes prediction on each training feature, then, comparing the prediction performance with the majority class [19][20][21][22][23][24][25]. The number of times the naïve Bayes makes a correct prediction of the true class is noted (by taking counts) compared to the majority class.…”

Section: Hoeffding Tree (Ht)mentioning

confidence: 99%

Efficient method for breast cancer classification based on ensemble hoffeding tree and naïve Bayes

Alhayali

Ahmed

Mohialden

et al. 2020

IJEECS

Self Cite

View full text Add to dashboard Cite

<p><span>The most dangerous type of cancer suffered by women above 35 years of age is breast cancer. Breast Cancer datasets are normally characterized by missing data, high dimensionality, non-normal distribution, class imbalance, noisy, and inconsistency. Classification is a machine learning (ML) process which has a significant role in the prediction of outcomes, and one of the outstanding supervised classification methods in data mining is Naives Bayess Classification (NBC). Naïve Bayes Classifications is good at predicting outcomes and often outperforms other classifications techniques. Ones of the reasons behind this strong performance of NBC is the assumptions of conditional Independences among the initial parameters and the predictors. However, this assumption is not always true and can cause loss of accuracy. Hoeffding trees assume the suitability of using a small sample to select the optimal splitting attribute. This study proposes a new method for improving accuracy of classification of breast cancer datasets. The method proposes the use of Hoeffding trees for normal classification and naïve Bayes for reducing data dimensionality.</span></p>

show abstract

“…Since the conventional computing techniques could not provide the expected result and efficiency to manage big data. The different distributed frameworks like hadoop [4], spark [5], and storm [6] have been introduced to satisfy the prerequisite of taking care of the big data.…”

Section: Introductionmentioning

confidence: 99%

A smart method for spark using neural network for big data

Rahman

Hossen

Sultana

et al. 2021

IJECE

View full text Add to dashboard Cite

Apache spark, famously known for big data handling ability, is a distributed open-source framework that utilizes the idea of distributed memory to process big data. As the performance of the spark is mostly being affected by the spark predominant configuration parameters, it is challenging to achieve the optimal result from spark. The current practice of tuning the parameters is ineffective, as it is performed manually. Manual tuning is challenging for large space of parameters and complex interactions with and among the parameters. This paper proposes a more effective, self-tuning approach subject to a neural network called Smart method for spark using neural network for big data (SSNNB) to avoid the disadvantages of manual tuning of the parameters. The paper has selected five predominant parameters with five different sizes of data to test the approach. The proposed approach has increased the speed of around 30% compared with the default parameter configuration.

show abstract

An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark

Cited by 26 publications

References 34 publications

A new model for large dataset dimensionality reduction based on teaching learning-based optimization and logistic regression

A new model for large dataset dimensionality reduction based on teaching learning-based optimization and logistic regression

Efficient method for breast cancer classification based on ensemble hoffeding tree and naïve Bayes

A smart method for spark using neural network for big data

Contact Info

Product

Resources

About