Context pre-modeling: an empirical analysis for classification based user-centric context-aware predictive modeling

Sarker, Iqbal H.; Alqahtani, Hamed; Alsolami, Fawaz; Khan, Asif Irshad; Abushark, Yoosef B.; Siddiqui, Mohammad Khubeb

doi:10.1186/s40537-020-00328-3

Cited by 26 publications

(20 citation statements)

References 45 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A right and optimal subset of the selected features in a problem domain is capable to minimize the overfitting problem through simplifying and generalizing the model as well as increases the model's accuracy [97]. Thus, "feature selection" [66,99] is considered as one of the primary concepts in machine learning that greatly affects the effectiveness and efficiency of the target machine learning model. Chi-squared test, Analysis of variance (ANOVA) test, Pearson's correlation coefficient, recursive feature elimination, are some popular techniques that can be used for feature selection.…”

Section: Dimensionality Reduction and Feature Learningmentioning

confidence: 99%

“…-Feature extraction: In a machine learning-based model or system, feature extraction techniques usually provide a better understanding of the data, a way to improve prediction accuracy, and to reduce computational cost or training time. The aim of "feature extraction" [66,99] is to reduce the number of features in a dataset by generating new ones from the existing ones and then discarding the original features. The majority of the information found in the original set of features can then be summarized using this new reduced set of features.…”

Section: Dimensionality Reduction and Feature Learningmentioning

confidence: 99%

“…This feature selection algorithm looks only at the (X) features, not the (y) outputs needed, and can, therefore, be used for unsupervised learning. • Pearson correlation: Pearson's correlation is another method to understand a feature's relation to the response variable and can be used for feature selection [99]. This method is also used for finding the association between the features in a dataset.…”

Section: Dimensionality Reduction and Feature Learningmentioning

confidence: 99%

See 2 more Smart Citations

Machine Learning: Algorithms, Real-World Applications and Research Directions

2021

Self Cite

View full text Add to dashboard Cite

In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning, which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study's key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.

show abstract

Section: Dimensionality Reduction and Feature Learningmentioning

confidence: 99%

Section: Dimensionality Reduction and Feature Learningmentioning

confidence: 99%

Section: Dimensionality Reduction and Feature Learningmentioning

confidence: 99%

See 1 more Smart Citation

Machine Learning: Algorithms, Real-World Applications and Research Directions

2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Before the data is ready for modeling, it’s necessary to use data summarization and visualization to audit the quality of the data and provide the information needed to process it. To ensure the quality of the data, the data pre-processing technique, which is typically the process of cleaning and transforming raw data [ 107 ] before processing and analysis is important. It also involves reformatting information, making data corrections, and merging data sets to enrich data.…”

Section: Understanding Data Science Modelingmentioning

confidence: 99%

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Sarker

2021

SN COMPUT. SCI.

Self Cite

219

View full text Add to dashboard Cite

The digital world has a wealth of data, such as internet of things (IoT) data, business data, health data, mobile data, urban data, security data, and many more, in the current age of the Fourth Industrial Revolution (Industry 4.0 or 4IR). Extracting knowledge or useful insights from these data can be used for smart decision-making in various applications domains. In the area of data science, advanced analytics methods including machine learning modeling can provide actionable insights or deeper knowledge about data, which makes the computing process automatic and smart. In this paper, we present a comprehensive view on "Data Science" including various types of advanced analytics methods that can be applied to enhance the intelligence and capabilities of an application through smart decision-making in different scenarios. We also discuss and summarize ten potential real-world application domains including business, healthcare, cybersecurity, urban and rural data science, and so on by taking into account data-driven smart computing and decision making. Based on this, we finally highlight the challenges and potential research directions within the scope of our study. Overall, this paper aims to serve as a reference point on data science and advanced analytics to the researchers and decision-makers as well as application developers, particularly from the data-driven solution point of view for real-world problems.

show abstract

“…In this paper, we propose a model based on the Isolation Forest algorithm. At first, we perform necessary preprocessing steps like categorical feature encoding, feature scaling [11] to extract fifteen essential features to fit into the proposed model. Finally, we applied five popular classification algorithms [14] such as Logistic Regression (LR), Support Vector Machine (SVM), AdaBoost Classifier (ABC), Naive Bayes (NB), and K-Nearest Neighbor (KNN) to evaluate the performance of our system.…”

Section: Introductionmentioning

confidence: 99%

An Isolation Forest Learning Based Outlier Detection Approach for Effectively Classifying Cyber Anomalies

Ripan¹,

Sarker²,

Anwar³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Cybersecurity has recently gained considerable interest in today's security issues because of the popularity of the Internet-of-Things (IoT), the considerable growth of mobile networks, and many related apps. Therefore, detecting numerous cyber-attacks in a network and creating an effective intrusion detection system plays a vital role in today's security. However, it is difficult to accurately model cyber threats since modern security databases contain large number of security features that could include Outliers. In this paper, we present an Isolation Forest Learning-Based Outlier Detection Model for effectively classifying cyber anomalies. In order to evaluate the efficacy of the resulting Outlier Detection model, we also use several conventional machine learning approaches, such as Logistic Regression (LR), Support Vector Machine (SVM), AdaBoost Classifier (ABC), Naive Bayes (NB), and K-Nearest Neighbor (KNN). The effectiveness of our propsoed Outlier Detection model is evaluated by conducting experiments on Network Intrusion Dataset with evaluation metrics such as precision, recall, F1-score, and accuracy. Experimental results show that the classification accuracy of cyber anomalies has been improved after removing outliers.

show abstract

Context pre-modeling: an empirical analysis for classification based user-centric context-aware predictive modeling

Cited by 26 publications

References 45 publications

Machine Learning: Algorithms, Real-World Applications and Research Directions

Machine Learning: Algorithms, Real-World Applications and Research Directions

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

An Isolation Forest Learning Based Outlier Detection Approach for Effectively Classifying Cyber Anomalies

Contact Info

Product

Resources

About