Due to digitization, a huge volume of data is being generated across several sectors such as healthcare, production, sales, IoT devices, Web, organizations. Machine learning algorithms are used to uncover patterns among the attributes of this data. Hence, they can be used to make predictions that can be used by medical practitioners and people at managerial level to make executive decisions. Not all the attributes in the datasets generated are important for training the machine learning algorithms. Some attributes might be irrelevant and some might not affect the outcome of the prediction. Ignoring or removing these irrelevant or less important attributes reduces the burden on machine learning algorithms. In this work two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are investigated on four popular Machine Learning (ML) algorithms, Decision Tree Induction, Support Vector Machine (SVM), Naive Bayes Classifier and Random Forest Classifier using publicly available Cardiotocography (CTG) dataset from University of California and Irvine Machine Learning Repository. The experimentation results prove that PCA outperforms LDA in all the measures. Also, the performance of the classifiers, Decision Tree, Random Forest examined is not affected much by using PCA and LDA.To further analyze the performance of PCA and LDA the eperimentation is carried out on Diabetic Retinopathy (DR) and Intrusion Detection System (IDS) datasets. Experimentation results prove that ML algorithms with PCA produce better results when dimensionality of the datasets is high. When dimensionality of datasets is low it is observed that the ML algorithms without dimensionality reduction yields better results. INDEX TERMS Cardiotocography dataset, dimensionality reduction, feature engineering, linear discriminant analysis, machine learning, principal component analysis.
Classification of imbalanced data is a vastly explored issue of the last and present decade and still keeps the same importance because data are an essential term today and it becomes crucial when data are distributed into several classes. The term imbalance refers to uneven distribution of data into classes that severely affects the performance of traditional classifiers, that is, classifiers become biased toward the class having larger amount of data. The data generated from wireless sensor networks will have several imbalances. This review article is a decent analysis of imbalance issue for wireless sensor networks and other application domains, which will help the community to understand WHAT, WHY, and WHEN of imbalance in data and its remedies.
Continuous growth in software, hardware and internet technology has enabled the growth of internet-based sensor tools that provide physical world observations and data measurement. The Internet of Things(IoT) is made up of billions of smart things that communicate, extending the boundaries of physical and virtual entities of the world further. These intelligent things produce or collect massive data daily with a broad range of applications and fields. Analytics on these huge data is a critical tool for discovering new knowledge, foreseeing future knowledge and making control decisions that make IoT a worthy business paradigm and enhancing technology. Deep learning has been used in a variety of projects involving IoT and mobile apps, with encouraging early results. With its data-driven, anomaly-based methodology and capacity to detect developing, unexpected attacks, deep learning may deliver cutting-edge solutions for IoT intrusion detection. In this paper, the increased amount of information gathered or produced is being used to further develop intelligence and application capabilities through Deep Learning (DL) techniques. Many researchers have been attracted to the various fields of IoT, and both DL and IoT techniques have been approached. Different studies suggested DL as a feasible solution to manage data produced by IoT because it was intended to handle a variety of data in large amounts, requiring almost real-time processing. We start by discussing the introduction to IoT, data generation and data processing. We also discuss the various DL approaches with their procedures. We surveyed and summarized major reporting efforts for DL in the IoT region on various datasets. The features, application and challenges that DL uses to empower IoT applications, which are also discussed in this promising field, can motivate and inspire further developments.
In today’s world, diabetic retinopathy is a very severe health issue, which is affecting many humans of different age groups. Due to the high levels of blood sugar, the minuscule blood vessels in the retina may get damaged in no time and further may lead to retinal detachment and even sometimes lead to glaucoma blindness. If diabetic retinopathy can be diagnosed at the early stages, then many of the affected people will not be losing their vision and also human lives can be saved. Several machine learning and deep learning methods have been applied on the available data sets of diabetic retinopathy, but they were unable to provide the better results in terms of accuracy in preprocessing and optimizing the classification and feature extraction process. To overcome the issues like feature extraction and optimization in the existing systems, we have considered the Diabetic Retinopathy Debrecen Data Set from the UCI machine learning repository and designed a deep learning model with principal component analysis (PCA) for dimensionality reduction, and to extract the most important features, Harris hawks optimization algorithm is used further to optimize the classification and feature extraction process. The results shown by the deep learning model with respect to specificity, precision, accuracy, and recall are very much satisfactory compared to the existing systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.