2021
DOI: 10.1109/access.2021.3059298
|View full text |Cite
|
Sign up to set email alerts
|

A Hybrid Feature Selection Method RFSTL for Manufacturing Quality Prediction Based on a High Dimensional Imbalanced Dataset

Abstract: Under Industry 4.0, manufacturing quality prediction has been gaining increased interest from researchers and manufacturers. From the analysis of previous studies on quality predictions using machine learning, it became clear that the high dimensionality and imbalance of data are major and common problems affecting the learning performance. This work uses a hybrid method to address this issue, applying a Synthetic Minority Oversampling Technique & TomekLinks balancing approach to create balanced data and using… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 20 publications
(5 citation statements)
references
References 41 publications
0
5
0
Order By: Relevance
“…And finally, high dimensional data is prone for overfitting. To ameliorate the issues related to high dimensional data the it is often advised to reduce the number of features [23].…”
Section: Data Driven Methodsmentioning
confidence: 99%
“…And finally, high dimensional data is prone for overfitting. To ameliorate the issues related to high dimensional data the it is often advised to reduce the number of features [23].…”
Section: Data Driven Methodsmentioning
confidence: 99%
“…When dataset has a high-dimensional predictors, feature selection (Xu et al, 2020) or feature extraction (Lee and Seo, 2020) are mostly applied. Zhou et al (2021) propose a hybrid feature selection method for a high dimensional imbalanced dataset. Attenberg and Provost (2010) suggest an alternative scheme to address the extreme class imbalance problem by deploying low-cost human resources for data acquisition, in which active learning is not efficient, called guided learning.…”
Section: Related Workmentioning
confidence: 99%
“…The EMR [114] is usually implemented in clinicians' offices, clinics, and hospitals to capture notes, assessments, and treatment records cross-sectionally and longitudinally for diagnosis and treatment. [74] extracted 89 features from longitudinal retrospective EMR data and shortlisted 20 features using RF Gini impurity [115] scores and SMOTE [115] to upsample and overcome the class imbalance in the ASD dataset. The LR predicted ASD risk with an AUC of 0.727.…”
Section: ) Assessments Datasets and Emr Analysismentioning
confidence: 99%