Andrea Cappozzo scite author profile

In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method.

show abstract

Anomaly and Novelty detection for robust semi-supervised learning

Cappozzo

Greselin

Murphy

2020

Stat Comput

View full text Add to dashboard Cite

A blood DNA methylation biomarker for predicting short-term risk of cardiovascular events

et al. 2022

View full text Add to dashboard Cite

Background Recent evidence highlights the epidemiological value of blood DNA methylation (DNAm) as surrogate biomarker for exposure to risk factors for non-communicable diseases (NCD). DNAm surrogate of exposures predicts diseases and longevity better than self-reported or measured exposures in many cases. Consequently, disease prediction models based on blood DNAm surrogates may outperform current state-of-the-art prediction models. This study aims to develop novel DNAm surrogates for cardiovascular diseases (CVD) risk factors and develop a composite biomarker predictive of CVD risk. We compared the prediction performance of our newly developed risk score with the state-of-the-art DNAm risk scores for cardiovascular diseases, the ‘next-generation’ epigenetic clock DNAmGrimAge, and the prediction model based on traditional risk factors SCORE2. Results Using data from the EPIC Italy cohort, we derived novel DNAm surrogates for BMI, blood pressure, fasting glucose and insulin, cholesterol, triglycerides, and coagulation biomarkers. We validated them in four independent data sets from Europe and the USA. Further, we derived a DNAmCVDscore predictive of the time-to-CVD event as a combination of several DNAm surrogates. ROC curve analyses show that DNAmCVDscore outperforms previously developed DNAm scores for CVD risk and SCORE2 for short-term CVD risk. Interestingly, the performance of DNAmGrimAge and DNAmCVDscore was comparable (slightly lower for DNAmGrimAge, although the differences were not statistically significant). Conclusions We described novel DNAm surrogates for CVD risk factors useful for future molecular epidemiology research, and we described a blood DNAm-based composite biomarker, DNAmCVDscore, predictive of short-term cardiovascular events. Our results highlight the usefulness of DNAm surrogate biomarkers of risk factors in epigenetic epidemiology to identify high-risk populations. In addition, we provide further evidence on the effectiveness of prediction models based on DNAm surrogates and discuss methodological aspects for further improvements. Finally, our results encourage testing this approach for other NCD diseases by training and developing DNAm surrogates for disease-specific risk factors and exposures.

show abstract

Robust variable selection in the framework of classification with label noise and outliers: Applications to spectroscopic data in agri-food

Cappozzo

Duponchel

Greselin

et al. 2021

Analytica Chimica Acta

View full text Add to dashboard Cite

Robust variable selection for model-based learning in presence of adulteration

Cappozzo

Greselin

2021

Computational Statistics & Data Analysis

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Andrea Cappozzo

A robust approach to model-based classification based on trimming and constraints

Anomaly and Novelty detection for robust semi-supervised learning

A blood DNA methylation biomarker for predicting short-term risk of cardiovascular events

Robust variable selection in the framework of classification with label noise and outliers: Applications to spectroscopic data in agri-food

Robust variable selection for model-based learning in presence of adulteration

Contact Info

Product

Resources

About