Developing
a generalized model for a robust prediction
of nanotoxicity
is critical for designing safe nanoparticles. However, complex toxicity
mechanisms of nanoparticles in biological environments, such as biomolecular
corona formation, prevent a reliable nanotoxicity prediction. This
is exacerbated by the potential evaluation bias caused by internal
validation, which is not fully appreciated. Herein, we propose an
evidence-based prediction method for distinguishing between cytotoxic
and noncytotoxic nanoparticles at a given condition by uniting literature
data mining and machine learning. We illustrate the proposed method
for amorphous silica nanoparticles (SiO2-NPs). SiO2-NPs are currently considered a safety concern; however, they
are still widely produced and used in various consumer products. We
generated the most diverse attributes of SiO2-NP cellular
toxicity to date, using >100 publications, and built predictive
models,
with algorithms ranging from linear to nonlinear (deep neural network,
kernel, and tree-based) classifiers. These models were validated using
internal (4124-sample) and external (905-sample) data sets. The resultant
categorical boosting (CatBoost) model outperformed other algorithms.
We then identified 13 key attributes, including concentration, serum,
cell, size, time, surface, and assay type, which can explain SiO2-NP toxicity, using the Shapley Additive exPlanation values
in the CatBoost model. The serum attribute underscores the importance
of nanoparticle–corona complexes for nanotoxicity prediction.
We further show that internal validation does not guarantee generalizability.
In general, safe SiO2-NPs can be obtained by modifying
their surfaces and using low concentrations. Our work provides a strategy
for predicting and explaining the toxicity of any type of engineered
nanoparticles in real-world practice.