Prediction and Variable Selection in High-Dimensional Misspecified Binary Classification

Furmańczyk, Konrad; Rejchel, Wojciech

doi:10.3390/e22050543

Cited by 5 publications

(3 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the recent availability of more informative databases obtained from EHR opens up new research opportunities. These current databases containing records of hundreds of thousands of appointments allow the use of modern predictive techniques such as deep neural networks or novel binary classification algorithms for high-dimensional settings, such as [ 65 , 66 ]. A second research line consists of developing and incorporating strategies that reduce the negative effects of class imbalance.…”

Section: Discussionmentioning

confidence: 99%

Patient No-Show Prediction: A Systematic Literature Review

Carreras-García

Delgado-Gómez

Llorente

et al. 2020

Entropy

View full text Add to dashboard Cite

Nowadays, across the most important problems faced by health centers are those caused by the existence of patients who do not attend their appointments. Among others, these patients cause loss of revenue to the health centers and increase the patients’ waiting list. In order to tackle these problems, several scheduling systems have been developed. Many of them require predicting whether a patient will show up for an appointment. However, obtaining these estimates accurately is currently a challenging problem. In this work, a systematic review of the literature on predicting patient no-shows is conducted aiming at establishing the current state-of-the-art. Based on a systematic review following the PRISMA methodology, 50 articles were found and analyzed. Of these articles, 82% were published in the last 10 years and the most used technique was logistic regression. In addition, there is significant growth in the size of the databases used to build the classifiers. An important finding is that only two studies achieved an accuracy higher than the show rate. Moreover, a single study attained an area under the curve greater than the 0.9 value. These facts indicate the difficulty of this problem and the need for further research.

show abstract

Section: Discussionmentioning

confidence: 99%

Patient No-Show Prediction: A Systematic Literature Review

Carreras-García

Delgado-Gómez

Llorente

et al. 2020

Entropy

View full text Add to dashboard Cite

show abstract

“…Ref. [4], similarly to [2], deals with the classification problem of a binary variable under misspecification. It focuses on establishing a general upper bound of excess risk, i.e., the difference between the risk of the linear classifier βT x, obtained as a minimizer of the penalized empirical risk pertaining to convex function φ, and the Bayes risk in such a case (Theorem 1).…”

mentioning

confidence: 99%

Nonparametric Statistical Inference with an Emphasis on Information-Theoretic Methods

Mielniczuk

2022

Entropy

View full text Add to dashboard Cite

show abstract

“…• Rendall et al [26]-extensive comparison of large scale data driven prediction methods based on VS and machine learning; • Marcjasz et al [27]-to electricity price forecasting; • Santi et al [28]-to predict mathematics scores of students; • Karim et al [29]-to predict post-operative outcomes of cardiac surgery patients; • Kim and Kang [30]-to faulty wafer detection in semiconductor manufacturing; • Furma ńczyk and Rejchel [31]-to high-dimensional binary classification problems; • Fouad and Loáiciga [5]-to predict percentile flows using inflow duration curve and regression models; • Ata Tutkun and Kayhan Atilgan [32]-investigated VS models in Cox regression, a multivariate model; • Mehmood et al [33]-compared several VS approaches in partial least-squares regression tasks;…”

mentioning

confidence: 99%

Selection of Temporal Lags for Predicting Riverflow Series from Hydroelectric Plants Using Variable Selection Methods

et al. 2020

View full text Add to dashboard Cite

The forecasting of monthly seasonal streamflow time series is an important issue for countries where hydroelectric plants contribute significantly to electric power generation. The main step in the planning of the electric sector’s operation is to predict such series to anticipate behaviors and issues. In general, several proposals of the literature focus just on the determination of the best forecasting models. However, the correct selection of input variables is an essential step for the forecasting accuracy, which in a univariate model is given by the lags of the time series to forecast. This task can be solved by variable selection methods since the performance of the predictors is directly related to this stage. In the present study, we investigate the performances of linear and non-linear filters, wrappers, and bio-inspired metaheuristics, totaling ten approaches. The addressed predictors are the extreme learning machine neural networks, representing the non-linear approaches, and the autoregressive linear models, from the Box and Jenkins methodology. The computational results regarding five series from hydroelectric plants indicate that the wrapper methodology is adequate for the non-linear method, and the linear approaches are better adjusted using filters.

show abstract

Prediction and Variable Selection in High-Dimensional Misspecified Binary Classification

Cited by 5 publications

References 35 publications

Patient No-Show Prediction: A Systematic Literature Review

Patient No-Show Prediction: A Systematic Literature Review

Nonparametric Statistical Inference with an Emphasis on Information-Theoretic Methods

Selection of Temporal Lags for Predicting Riverflow Series from Hydroelectric Plants Using Variable Selection Methods

Contact Info

Product

Resources

About