A new approach for handling imbalanced dataset using ANN and genetic algorithm

Sonak, Apurva; Patankar, Ruhi; Pise, Nitin

doi:10.1109/iccsp.2016.7754521

Cited by 18 publications

(10 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…As observed from the class feature distribution plot in figure 6.5, this dataset is highly imbalanced. Synthetic Minority Oversampling Technique (SMOTE) is employed to tackle this problem (Sonak et al, 2016) There was one feature (cd_000) that had a single value for all data (standard deviation = 0). Since it will not add much value to our model performance, it can be removed.…”

Section: Feature Engineeringmentioning

confidence: 99%

Design of software-oriented technician for vehicle’s fault system prediction using AdaBoost and random forest classifiers

Thomas¹,

Sumathi²

2022

Int. J. Eng. Sci. Tech

View full text Add to dashboard Cite

Detecting and isolating faults on heavy duty vehicles is very important because it helps maintain high vehicle performance, low emissions, fuel economy, high vehicle safety and ensures repair and service efficiency. These factors are important because they help reduce the overall life cycle cost of a vehicle. The aim of this paper is to deliver a Web application model which aids the professional technician or vehicle user with basic automobile knowledge to access the working condition of the vehicles and detect the fault subsystem in the vehicles. The scope of this system is to visualize the data acquired from vehicle, diagnosis the fault component using trained fault model obtained from improvised Machine Learning (ML) classifiers and generate a report. The visualization page is built with plotly python package and prepared with selected parameter from On-board Diagnosis (OBD) tool data. The Histogram data is pre-processed with techniques such as null value Imputation techniques, Standardization and Balancing methods in order to increase the quality of training and it is trained with Classifiers. Finally, Classifier is tested and the Performance Metrics such as Accuracy, Precision, Re-call and F1 measure which are calculated from the Confusion Matrix. The proposed methodology for fault model prediction uses supervised algorithms such as Random Forest (RF), Ensemble Algorithm like AdaBoost Algorithm which offer reasonable Accuracy and Recall. The Python package joblib is used to save the model weights and reduce the computational time. Google Colabs is used as the python environment as it offers versatile features and PyCharm is utilised for the development of Web application. Hence, the Web application, outcome of this proposed work can, not only serve as the perfect companion to minimize the cost of time and money involved in unnecessary checks done for fault system detection but also aids to quickly detect and isolate the faulty system to avoid the propagation of errors that can lead to more dangerous cases.

show abstract

Section: Feature Engineeringmentioning

confidence: 99%

Design of software-oriented technician for vehicle’s fault system prediction using AdaBoost and random forest classifiers

Thomas¹,

Sumathi²

2022

Int. J. Eng. Sci. Tech

View full text Add to dashboard Cite

show abstract

“…Secara teknis, kumpulan suatu data dikatakan tidak seimbang jika distribusi antar kelas di dalam dataset tidak merata atau seragam. Dalam kondisi seperti ini satu kelas dataset yang digambarkan hanya oleh sejumlah kecil contoh atau kelas minoritas dan kelas lain membentuk sebagian besar data atau kelas mayoritas [13]. Akibatnya, sering hasil prediksi cenderung kepada dataset yang mayoritas.…”

Section: Pendahuluanunclassified

“…Pertama adalah dengan cara pendekatan level data (Sampling) yaitu pengambilan ulang sampel data untuk mengubah data yang tidak seimbang menjadi seimbang. Pendekatan level data (Sampling) dapat digunakan untuk modifikasi distribusi kelas dari data latih untuk menyeimbangkan data [13], pendekatan level data itu sendiri adalah tahapan preprocessing yang dilakukan sebelum membuat pemodelan machine learning [12]. Pendekatan kedua adalah dengan cara penyesuaian Cost Sensitive pada data aslinya [13], Cost Sensitive merupakan pembelajaran machine learning dalam mempertimbangkan kesalahan klasifikasi [13].…”

Section: Pendahuluanunclassified

Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest

Azhar¹,

Pardede

2021

mib

View full text Add to dashboard Cite

Speech recognition is one of the important research fields which is currently widely used for various applications. However, speech recognition performance is affected by the dialect of the speaker. Therefore, dialect recognition is often used as an additional feature in speech recognition. The process of recognizing dialects is not easy. Currently, Machine Learning technology is widely applied in dialect recognition. One of the challenges in the introduction of machine learning-based dialects is the imbalance of classes and overlaps in a wide variety of classification techniques. This study applies Random Forest-based oversampling technology for dialect recognition. For hyper-parameter optimization of the random forest algorithm, we apply the Grid Search method. Experiments on Speech Accent Archive data using the MFCC feature resulted in an accuracy of 0.91 and AUC of 0.95

show abstract

“…Class imbalance problems have gained significant attention in the ML community recently [32,33]. Kotsiantis et al [34] used different tools and techniques to handle class imbalance and Sonak et al [35] analyzed several different methods of class imbalance problems. Classification becomes more cumbersome as data size increases, due to unbounded and unbalanced data quality.…”

Section: Background and Related Workmentioning

confidence: 99%

A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network

2018

View full text Add to dashboard Cite

Every day we experience unprecedented data growth from numerous sources, which contribute to big data in terms of volume, velocity, and variability. These datasets again impose great challenges to analytics framework and computational resources, making the overall analysis difficult for extracting meaningful information in a timely manner. Thus, to harness these kinds of challenges, developing an efficient big data analytics framework is an important research topic. Consequently, to address these challenges by exploiting non-linear relationships from very large and high-dimensional datasets, machine learning (ML) and deep learning (DL) algorithms are being used in analytics frameworks. Apache Spark has been in use as the fastest big data processing arsenal, which helps to solve iterative ML tasks, using distributed ML library called Spark MLlib. Considering real-world research problems, DL architectures such as Long Short-Term Memory (LSTM) is an effective approach to overcoming practical issues such as reduced accuracy, long-term sequence dependency, and vanishing and exploding gradient in conventional deep architectures. In this paper, we propose an efficient analytics framework, which is technically a progressive machine learning technique merged with Spark-based linear models, Multilayer Perceptron (MLP) and LSTM, using a two-stage cascade structure in order to enhance the predictive accuracy. Our proposed architecture enables us to organize big data analytics in a scalable and efficient way. To show the effectiveness of our framework, we applied the cascading structure to two different real-life datasets to solve a multiclass and a binary classification problem, respectively. Experimental results show that our analytical framework outperforms state-of-the-art approaches with a high-level of classification accuracy.

show abstract

A new approach for handling imbalanced dataset using ANN and genetic algorithm

Cited by 18 publications

References 3 publications

Design of software-oriented technician for vehicle’s fault system prediction using AdaBoost and random forest classifiers

Design of software-oriented technician for vehicle’s fault system prediction using AdaBoost and random forest classifiers

Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest

A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network

Contact Info

Product

Resources

About