Enabling interpretable machine learning for biological data with reliability scores

Ahlquist, K D; Sugden, Lauren Alpert; Ramachandran, Sohini

doi:10.1371/journal.pcbi.1011175

Cited by 1 publication

(1 citation statement)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The differences in feature importance between the XGBoost and ANN models could reflect the indicate of fundamental differences in their data processing methodologies [ 90 ]. This underlines the importance of interpretability and reliability in ML models, especially in domains where decision making is closely tied to model outputs [ 91 ].…”

Section: Discussionmentioning

confidence: 99%

A Comparative Analysis of XGBoost and Neural Network Models for Predicting Some Tomato Fruit Quality Traits from Environmental and Meteorological Data

M’hamdi,

Takács,

Palotás

et al. 2024

Plants

View full text Add to dashboard Cite

The tomato as a raw material for processing is globally important and is pivotal in dietary and agronomic research due to its nutritional, economic, and health significance. This study explored the potential of machine learning (ML) for predicting tomato quality, utilizing data from 48 cultivars and 28 locations in Hungary over 5 seasons. It focused on °Brix, lycopene content, and colour (a/b ratio) using extreme gradient boosting (XGBoost) and artificial neural network (ANN) models. The results revealed that XGBoost consistently outperformed ANN, achieving high accuracy in predicting °Brix (R² = 0.98, RMSE = 0.07) and lycopene content (R² = 0.87, RMSE = 0.61), and excelling in colour prediction (a/b ratio) with a R² of 0.93 and RMSE of 0.03. ANN lagged behind particularly in colour prediction, showing a negative R² value of −0.35. Shapley additive explanation’s (SHAP) summary plot analysis indicated that both models are effective in predicting °Brix and lycopene content in tomatoes, highlighting different aspects of the data. SHAP analysis highlighted the models’ efficiency (especially in °Brix and lycopene predictions) and underscored the significant influence of cultivar choice and environmental factors like climate and soil. These findings emphasize the importance of selecting and fine-tuning the appropriate ML model for enhancing precision agriculture, underlining XGBoost’s superiority in handling complex agronomic data for quality assessment.

show abstract

Section: Discussionmentioning

confidence: 99%