Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning

Morita, Katsuaki; Mizuno, Tadahaya; Kusuhara, Hiroyuki

doi:10.1021/acs.jcim.2c00765

Cited by 2 publications

(3 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, relying on a single dataset split to create a model may pose challenges to the model’s representativeness and reliability. The performance of models established on small datasets would be obviously impacted by dataset size, split ratio, and split strategy, − and machine learning may not capture the full range of features and patterns present in the given training set. This underscores the importance of evaluating model performance and interpreting what a model learned across various splits.…”

Section: Resultsmentioning

confidence: 99%

Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment

Yu,

Tang,

et al. 2023

Environ. Sci. Technol.

View full text Add to dashboard Cite

Understanding plant uptake and translocation of nanomaterials is crucial for ensuring the successful and sustainable applications of seed nanotreatment. Here, we collect a dataset with 280 instances from experiments for predicting the relative metal/ metalloid concentration (RMC) in maize seedlings after seed priming by various metal and metalloid oxide nanoparticles. To obtain unbiased predictions and explanations on small datasets, we present an averaging strategy and add a dimension for interpretable machine learning. The findings in post-hoc interpretations of sophisticated LightGBM models demonstrate that solubility is highly correlated with model performance. Surface area, concentration, zeta potential, and hydrodynamic diameter of nanoparticles and seedling part and relative weight of plants are dominant factors affecting RMC, and their effects and interactions are explained. Furthermore, self-interpretable models using the RuleFit algorithm are established to successfully predict RMC only based on six important features identified by post-hoc explanations. We then develop a visualization tool called RuleGrid to depict feature effects and interactions in numerous generated rules. Consistent parameter-RMC relationships are obtained by different methods. This study offers a promising interpretable data-driven approach to expand the knowledge of nanoparticle fate in plants and may profoundly contribute to the safety-by-design of nanomaterials in agricultural and environmental applications.

show abstract

Section: Resultsmentioning

confidence: 99%

Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment

Yu,

Tang,

et al. 2023

Environ. Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…When it comes to using adaptive models and assessing their adaptive behavior there are a number of strategies and approaches being used across the field of AI (Groce et al, 2002 ; Yang et al, 2005 ; Xiao et al, 2016 ; López and Tucker, 2018 ). Currently, a random split cross-validation model is considered the ML standard for model building and evaluation (Morita et al, 2022 ). Random split cross-validation is often found to be overoptimistic in comparison to real-world situations, while a time-split approach is considered suitable for real-world prediction (Morita et al, 2022 ).…”

Section: Discussionmentioning

confidence: 99%

“…Currently, a random split cross-validation model is considered the ML standard for model building and evaluation (Morita et al, 2022 ). Random split cross-validation is often found to be overoptimistic in comparison to real-world situations, while a time-split approach is considered suitable for real-world prediction (Morita et al, 2022 ). In this study, we proposed a time-split adaptability framework approach to exploring the adaptive behavior of an AI-based solution for drug toxicity and risk assessments within regulatory science.…”

Section: Discussionmentioning

confidence: 99%

Adaptability of AI for safety evaluation in regulatory science: A case study of drug-induced liver injury

Connor

Li²,

Roberts³

et al. 2022

Front. Artif. Intell.

View full text Add to dashboard Cite

Artificial intelligence (AI) has played a crucial role in advancing biomedical sciences but has yet to have the impact it merits in regulatory science. As the field advances, in silico and in vitro approaches have been evaluated as alternatives to animal studies, in a drive to identify and mitigate safety concerns earlier in the drug development process. Although many AI tools are available, their acceptance in regulatory decision-making for drug efficacy and safety evaluation is still a challenge. It is a common perception that an AI model improves with more data, but does reality reflect this perception in drug safety assessments? Importantly, a model aiming at regulatory application needs to take a broad range of model characteristics into consideration. Among them is adaptability, defined as the adaptive behavior of a model as it is retrained on unseen data. This is an important model characteristic which should be considered in regulatory applications. In this study, we set up a comprehensive study to assess adaptability in AI by mimicking the real-world scenario of the annual addition of new drugs to the market, using a model we previously developed known as DeepDILI for predicting drug-induced liver injury (DILI) with a novel Deep Learning method. We found that the target test set plays a major role in assessing the adaptive behavior of our model. Our findings also indicated that adding more drugs to the training set does not significantly affect the predictive performance of our adaptive model. We concluded that the proposed adaptability assessment framework has utility in the evaluation of the performance of a model over time.

show abstract

Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning

Cited by 2 publications

References 48 publications

Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment

Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment

Adaptability of AI for safety evaluation in regulatory science: A case study of drug-induced liver injury

Contact Info

Product

Resources

About