On-going big data from social networks sites alike Twitter or Facebook has been an entrancing hotspot for investigation by researchers in current decades as a result of various aspects including up-to-date-ness, accessibility and popularity; however anyway there may be a trade off in accuracy. Moreover, clustering of twitter data has caught the attention of researchers. As such, an algorithm which can cluster data within a lesser computational time, especially for data streaming is needed. The presented adaptive clustering and classification algorithm is used for data streaming in Apache spark to overcome the existing problems is processed in two phases. In the first phase, the input pre-processed twitter data is viably clustered utilizing an Improved Fuzzy C-means clustering and the proposed clustering is additionally improved by an Adaptive Particle swarm optimization (PSO) algorithm. Further the clustered data streaming is assessed utilizing spark engine. In the second phase, the input pre-processed Higgs data is classified utilizing the modified support vector machine (MSVM) classifier with grid search optimization. At long last the optimized information is assessed in spark engine and the assessed esteem is utilized to discover an accomplished confusion matrix. The proposed work is utilizing Twitter dataset and Higgs dataset for the data streaming in Apache Spark. The computational examinations exhibit the superiority of presented approach comparing with the existing methods in terms of precision, recall, F-score, convergence, ROC curve and accuracy.
Intrusion detection is mainly achieved by using optimization algorithms. The need for optimization algorithms for intrusion detection is necessitated by the increasing number of features in audit data, as well as the performance failure of the human-based smart intrusion detection system (IDS) in terms of their prolonged training time and classification accuracy. This article presents an improved intrusion detection technique for binary classification. The proposal is a combination of different optimizers, including Rao optimization algorithm, extreme learning machine (ELM), support vector machine (SVM), and logistic regression (LR) (for feature selection & weighting), as well as a hybrid Rao-SVM algorithm with supervised machine learning (ML) techniques for feature subset selection (FSS). The process of selecting the least number of features without sacrificing the FSS accuracy was considered a multi-objective optimization problem. The algorithm-specific, parameter-less concept of the proposed Rao-SVM was also explored in this study. The KDDCup 99 and CICIDS 2017 were used as the intrusion dataset for the experiments, where significant improvements were noted with the new Rao-SVM compared to the other algorithms. Rao-SVM presented better results than many existing works by reaching 100% accuracy for KDDCup 99 dataset and 97% for CICIDS dataset.
<p><span>The most dangerous type of cancer suffered by women above 35 years of age is breast cancer. Breast Cancer datasets are normally characterized by missing data, high dimensionality, non-normal distribution, class imbalance, noisy, and inconsistency. Classification is a machine learning (ML) process which has a significant role in the prediction of outcomes, and one of the outstanding supervised classification methods in data mining is Naives Bayess Classification (NBC). Naïve Bayes Classifications is good at predicting outcomes and often outperforms other classifications techniques. Ones of the reasons behind this strong performance of NBC is the assumptions of conditional Independences among the initial parameters and the predictors. However, this assumption is not always true and can cause loss of accuracy. Hoeffding trees assume the suitability of using a small sample to select the optimal splitting attribute. This study proposes a new method for improving accuracy of classification of breast cancer datasets. The method proposes the use of Hoeffding trees for normal classification and naïve Bayes for reducing data dimensionality.</span></p>
Various researches have been conducted to discover the machinery that led to the evolvement of non-symmetric formation of groups by uncountable marine animals. The huge of tasks comes per unit of time brought obstacles to assign each to particular server, while task assignment have needed a fast strategy to make decision. Artificial fish affect the environment through their behavior and the behavior of their peers. Creating a synthetic fish model has two parts: variables and functions which could be used for task assignment. This paper present improved fish swarm algorithm (IFSA) for task assignment to reduce the latency in cloud computing that could achieve one green computing goals. The research trying to reduce the pending job numbers compared with exist research.
Advancements in information technology is contributing to the excessive rate of big data generation recently. Big data refers to datasets that are huge in volume and consumes much time and space to process and transmit using the available resources. Big data also covers data with unstructured and structured formats. Many agencies are currently subscribing to research on big data analytics owing to the failure of the existing data processing techniques to handle the rate at which big data is generated. This paper presents an efficient classification and reduction technique for big data based on parallel generalized Hebbian algorithm (GHA) which is one of the commonly used principal component analysis (PCA) neural network (NN) learning algorithms. The new method proposed in this study was compared to the existing methods to demonstrate its capabilities in reducing the dimensionality of big data. The proposed method in this paper is implemented using Spark Radoop platform.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.