A Hybrid Feature Selection Method RFSTL for Manufacturing Quality Prediction Based on a High Dimensional Imbalanced Dataset

Zhou, Hong; Yu, Kun-Ming; Chen, Yen-Chiu; Hsu, Huan-Po

doi:10.1109/access.2021.3059298

Cited by 20 publications

(5 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…And finally, high dimensional data is prone for overfitting. To ameliorate the issues related to high dimensional data the it is often advised to reduce the number of features [23].…”

Section: Data Driven Methodsmentioning

confidence: 99%

Machine learning methods in time series forecasting: a review

Kamolov,

Iskhakov,

Ziyaev

2021

AMCS

View full text Add to dashboard Cite

show abstract

“…And finally, high dimensional data is prone for overfitting. To ameliorate the issues related to high dimensional data the it is often advised to reduce the number of features [23].…”

Section: Data Driven Methodsmentioning

confidence: 99%

Machine learning methods in time series forecasting: a review

Kamolov,

Iskhakov,

Ziyaev

2021

AMCS

View full text Add to dashboard Cite

show abstract

“…When dataset has a high-dimensional predictors, feature selection (Xu et al, 2020) or feature extraction (Lee and Seo, 2020) are mostly applied. Zhou et al (2021) propose a hybrid feature selection method for a high dimensional imbalanced dataset. Attenberg and Provost (2010) suggest an alternative scheme to address the extreme class imbalance problem by deploying low-cost human resources for data acquisition, in which active learning is not efficient, called guided learning.…”

Section: Related Workmentioning

confidence: 99%

Performance-based active learning for skewed data with nonparametric logistic regression

Lee

Seo

2023

Preprint

View full text Add to dashboard Cite

Real-world data often exhibit skewed distribution with a long tail, where certain target values have significantly fewer observations rather than preserving an ideal uniform distribution over each category, which substantially affects model performance for classification problems. Furthermore, parametric logistic regression provides a fundamental classification model with ease of interpretation; however, it is doubtful that the logit function of classification is truly linear in covariates. This research proposes the performance-based active learning (PbAL) scheme with nonparametric logistic regression to address the imbalance problem considering the nonlinear decision boundary. The PbAL is applied to choose the most informative samples in a sequential manner with an imbalanced dataset by directly evaluating a performance metric on a pool set. The nonparametric logistic regression model with smoothing splines is used to achieve a flexible classification boundary. The experiments show that PbAL outperforms traditional active learning approaches based on D-optimality and A-optimality. It is also shown that the proposed method provides superior outcomes compared to the other resampling techniques used for imbalanced classification problems, such as Tomek Link and SMOTE, even with a smaller sample size. This result suggests that PbAL effectively mitigates the bias, which severely influences the model performance with small amounts of initial training data.

show abstract

“…The EMR [114] is usually implemented in clinicians' offices, clinics, and hospitals to capture notes, assessments, and treatment records cross-sectionally and longitudinally for diagnosis and treatment. [74] extracted 89 features from longitudinal retrospective EMR data and shortlisted 20 features using RF Gini impurity [115] scores and SMOTE [115] to upsample and overcome the class imbalance in the ASD dataset. The LR predicted ASD risk with an AUC of 0.727.…”

Section: ) Assessments Datasets and Emr Analysismentioning

confidence: 99%

The Role of Intelligent Technologies in Early Detection of Autism Spectrum Disorder (ASD): A Scoping Review

2022

View full text Add to dashboard Cite

Background: Two-year delay is reported between the first developmental concern raised by the parents and the diagnosis of ASD (Autism Spectrum Disorder), delaying the start of early intervention programs most beneficial within the first three years. Aim: Evaluate the role of technology in ASD detection by answering four research questions analyzing 1) evolution of technology, 2) use of various bio-behavioral data sources, 3) demographic categories, databases, controls, comparators, and assessment instruments, and 4) data collection, processing, and outcomes of the technology-based methods in ASD detection. Methods: Scoping review included behavioral-based ASD screening and diagnostic studies, published between 1st January 2011 to 31st December 2021 in PUBMED, SCOPUS, and IEEE Xplore databases for children under six years. The studies were assessed using the Critical Appraisal Skills Programm (CASP) and the PRISMA scoping review checklist (PRISMA-ScR). Results: The shortlisted 35 studies were categorized into seven bio-behavioral categories. The review suggested extensive use of machine learning (ML) and Deep Learning (DL) technologies with multimodal structured and unstructured data to detect infants at risk of ASD and Other developmental delays (ODD) as early as 9 to 12 months. However, the review reported various internal and external validity threats. Conclusion: Technology can significantly improve the current ASD detection process. The validation and adoption of technology can be fast-tracked by 1) designing robust study protocols, 2) executing multicultural field trials, 3) standardizing datasets, data quality and feature engineering methods, 4) recruiting statistically significant participants from ASD, typically developing (TD) and other developmental disorders (ODD) groups to ensure technological generalization, validation, and adoption outside laboratory settings.

show abstract

A Hybrid Feature Selection Method RFSTL for Manufacturing Quality Prediction Based on a High Dimensional Imbalanced Dataset

Cited by 20 publications

References 41 publications

Machine learning methods in time series forecasting: a review

Machine learning methods in time series forecasting: a review

Performance-based active learning for skewed data with nonparametric logistic regression

The Role of Intelligent Technologies in Early Detection of Autism Spectrum Disorder (ASD): A Scoping Review

Contact Info

Product

Resources

About