2018
DOI: 10.1038/s41598-018-21431-9
|View full text |Cite
|
Sign up to set email alerts
|

Toxicity Classification of Oxide Nanomaterials: Effects of Data Gap Filling and PChem Score-based Screening Approaches

Abstract: Development of nanotoxicity prediction models is becoming increasingly important in the risk assessment of engineered nanomaterials. However, it has significant obstacles caused by the wide heterogeneities of published literature in terms of data completeness and quality. Here, we performed a meta-analysis of 216 published articles on oxide nanoparticles using 14 attributes of physicochemical, toxicological and quantum-mechanical properties. Particularly, to improve completeness and quality of the extracted da… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
52
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 46 publications
(56 citation statements)
references
References 43 publications
1
52
0
Order By: Relevance
“…For example, Ban et al [80] used curve-fitting to calculate missing ages based on the age-weight relationships of different species. While assessing data quality and completeness, nano-specific filling in of missing values using manufacturer's specifications and/or estimations [60,64] was suggested within the Safe and Sustainable Nanotechnology (S2NANO) (http://portal.s2nano.org/ (Webpage accessed autumn 2019)) database. Furxhi et al [72] investigated the robustness of several ML tools on generated versions of the dataset by removing values artificially.…”
Section: Missing Valuesmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, Ban et al [80] used curve-fitting to calculate missing ages based on the age-weight relationships of different species. While assessing data quality and completeness, nano-specific filling in of missing values using manufacturer's specifications and/or estimations [60,64] was suggested within the Safe and Sustainable Nanotechnology (S2NANO) (http://portal.s2nano.org/ (Webpage accessed autumn 2019)) database. Furxhi et al [72] investigated the robustness of several ML tools on generated versions of the dataset by removing values artificially.…”
Section: Missing Valuesmentioning
confidence: 99%
“…They found that the model captured nonlinear dependence between descriptors and cytotoxicity as well as possible interactions. RF is an ML recursive ensemble algorithm based on a combination of independently grown binary decision trees constructed with various samples of a bootstrap [64]. By aggregating the predictions of each tree, the RF algorithm makes forecasts depending significantly on two model parameters.…”
Section: Model Implementationmentioning
confidence: 99%
“…material safety data sheets) which resulted in a second dataset (PD). Density values derived from manufactures information combined with particle size distribution data (assuming spherical shape and smooth surface) can be used to calculate the Specific Surface Area (SSA) (Ha et al 2018). Furthermore, as well as data gap filling, discretization was also performed during preprocessing.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…Most instances in the training dataset correspond to no effect outcomes for the majority of the endpoints. Imbalanced datasets can limit the performance of most classification algorithms, making the prediction biased to the dominant class value (Ha et al 2018). To avoid this, we adjusted the relative frequency of triggered/ no effect instances by resampling the second dataset by applying SMOTE (Synthetic Minority Oversampling Technique), a supervised instance algorithm that oversamples the minority instances using the k-nearest neighbors algorithm (Chawla et al 2002).…”
Section: Data Split and Balancingmentioning
confidence: 99%
“…The structure–activity relationship (Liu et al, ), perturbation (Kleandrova et al, ; Kleandrova et al, ; Luan et al, ; Speck‐Planche, Kleandrova, Luan, & Cordeiro, ), quasi‐SMILES‐ (Trinh et al, ), and theoretical descriptors‐based (Boukhvalov & Yoon, ) models have been established in the prediction of toxicity of nanoparticles. Some QSARs models relating physicochemical descriptors to cellular responses of nanomaterials based on the multivariate analysis (Le, Epa, Burden, & Winkler, ), including principal component (PC) analysis (PCA; Sayes & Ivanov, ; Lynch, Weiss, & Valsami‐Jonesa, ), hierarchical clustering (Shaw et al, ), linear discriminant analysis (Sayes & Ivanov, ), artificial neural network (Winkler et al, ), support vector machine (SVM; Fourches et al, ), naïve Bayes, k‐nearest neighbour (Chau & Yap, ), linear and nonlinear regression analysis (Can, ; Chau & Yap, ), PChem score‐based screening and data imputation approaches (Ha et al, ), and artificial neural network, random forest, SVM, and generalized linear models (Choi, Ha, Trinh, Yoon, & Byun, ), so forth were discussed in the previous studies. A short summary of some latest development in toxicity modelling of nanomaterials is presented in Table .…”
Section: Introductionmentioning
confidence: 99%