Big data and machine learning framework for clouds and its usage for text classification

Pintye, Istvan; Kail, Eszter; Kacsuk, Péter; Lovas, Róbert

doi:10.1002/cpe.6164

Cited by 11 publications

(10 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Zhao et al [14] suggested that big data has great potential in improving policy description and strengthening policy prediction ability. Pintye et al [15] studied the relevant policy framework of big data and proposed that the whole social governance, policies, and aspects closely related to big data should be considered as a complete system, involving user privacy, data accuracy, data collection methods, and social equity.…”

Section: Related Workmentioning

confidence: 99%

Design and Application of Intelligent Management Platform Based on Big Data

Dai

Zhou

2022

Security and Communication Networks

View full text Add to dashboard Cite

Big data technology has greatly promoted the construction of intelligent administrative management and improved the decision-making ability continuously. Data mining also lays a solid foundation for the construction of administrative management platform and reflects the potential value of data. In this study, an intelligent management platform based on big data is designed and implemented. First, the problems of the intellectualization of administrative management are discussed, and the big data platform and functional framework of administrative management are introduced. Second, in order to apply data mining to administrative management, the object of data mining in administrative management is defined, and a data mining system is designed. Finally, the application of machine learning methods such as cluster analysis in administrative management is analyzed in detail. The research results show that the application of intelligent management platform based on big data can promote the construction of intelligent administration and lay a good foundation for the development of more perfect intelligent administrative management.

show abstract

Section: Related Workmentioning

confidence: 99%

Design and Application of Intelligent Management Platform Based on Big Data

Dai

Zhou

2022

Security and Communication Networks

View full text Add to dashboard Cite

show abstract

“…Area under the curve (AUC), Matthews correlation coefficient, F1‐score, and precision‐recall curve are used for performance evaluation. AI methods are categorized into machine learning (ML) 13‐16,25,26 and deep learning algorithms 4,10,11,17,18 . ML methods are impressive, though, most of the methods involve manual FE due to the limited ability to manage a large set of features.…”

Section: Related Workmentioning

confidence: 99%

Big data analytics for identifying electricity theft using machine learning approaches in microgrids for smart communities

Arif

Javaid

Aldegheishem

et al. 2021

Concurrency and Computation

View full text Add to dashboard Cite

Electricity theft (ET) causes major revenue loss in power utilities. It reduces the quality of supply, raises production cost, causes legal consumers to pay the higher cost, and impacts the economy as a whole. In this article, we use the State Grid Corporation of China (SGCC) dataset, which contains electricity consumption data of 1035 days for two classes: normal and fraudulent. In this work, ET detection model is proposed that consists of four steps: interpolation, data balancing, feature extraction, and classification. First, missing values of the dataset are recovered using the interpolation method. Second, resampling technique is implemented. ET consumers are 9% in the SGCC dataset that make the model inefficient to correctly classify both classes (normal and theft). A hybrid resampling technique is proposed, named synthetic minority oversampling technique with near miss. Third, residual network extracts the latent features from the SGCC dataset. Fourth, three tree based classifiers, such as decision tree (DT), random forest (RF), and adaptive boosting (AdaBoost) are applied to train the encoded feature vectors for classification. Besides, search for good hyperparameters is a challenging task, which is usually done manually and takes a considerable amount of time. To resolve this problem, Bayesian optimizer is used to simplify the tuning process of DT, RF, and AdaBoost. Finally, the results indicate that RF outperforms DT and AdaBoost.

show abstract

“…The configuration of Spark is adjusted to control the level of parallelism applied to the data. This text analysis application was successfully handled by using the Spark-based reference architecture deployed on the ELKH Cloud, and the scientific findings have been already publicly released [31].…”

Section: Validation By the Hungarian Comparative Agendas Projectmentioning

confidence: 99%

Cloud-agnostic architectures for machine learning based on Apache Spark

Nagy

Lovas

Pintye

et al. 2021

Advances in Engineering Software

Self Cite

View full text Add to dashboard Cite

Reference architectures for Big Data, machine learning and stream processing include not only recommended practices and interconnected building blocks but considerations for scalability, availability, manageability, and security as well. However, the automated deployment of multi-VM platforms on various clouds leveraging on such reference architectures may raise several issues. The paper focuses particularly on the widespread Apache Spark Big Data platform as the baseline and the Occopus cloud-agnostic orchestrator tool. The set of new generation reference architectures are configurable by human-readable descriptors according to available resources and cloud-providers, and offers various components such as Jupyter Notebook, RStudio, HDFS, and Kafka. These pre-configured reference architectures can be automatically deployed even by the data scientist on-demand, using a multi-cloud approach for a wide range of cloud systems like Amazon AWS, Microsoft Azure, Open-Stack, OpenNebula, CloudSigma, etc. Occopus enables the scaling of cluster-oriented components (such as Spark) of the instantiated reference architectures. The presented solution was successfully used in the Hungarian Comparative Agendas Project (CAP) by the Institute for Political Science to classify newspaper articles.

show abstract

Big data and machine learning framework for clouds and its usage for text classification

Cited by 11 publications

References 23 publications

Design and Application of Intelligent Management Platform Based on Big Data

Design and Application of Intelligent Management Platform Based on Big Data

Big data analytics for identifying electricity theft using machine learning approaches in microgrids for smart communities

Cloud-agnostic architectures for machine learning based on Apache Spark

Contact Info

Product

Resources

About