Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

Teixeira, Ana Luísa; Leal, João Paulo; Falcão, André O.

doi:10.1186/1758-2946-5-9

Cited by 61 publications

(62 citation statements)

References 65 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The method comes therefore to a non-linear consensus on unpruned decision trees. Two important variables (although the method is not very sensitive to their values) are the number of trees to grow in the forest (ntree) and the number of variables to choose at each node (mtry) [60,62]. Using RF there is a reduced risk of overfitting, since this approach uses a large number of simple models and includes the possibility to treat non-standard problems (number of descriptors higher than that of observations).…”

Section: Random Forestmentioning

confidence: 99%

Investigation of the influence of protein corona composition on gold nanoparticle bioactivity using machine learning approaches

Papa

Doucet

Sangion

et al. 2016

SAR and QSAR in Environmental Research

View full text Add to dashboard Cite

The understanding of the mechanisms and interactions that occur when nanomaterials enter biological systems is important to improve their future use. The adsorption of proteins from biological fluids in a physiological environment to form a corona on the surface of nanoparticles represents a key step that influences nanoparticle behaviour. In this study, the quantitative description of the composition of the protein corona was used to study the effect on cell association induced by 84 surface-modified gold nanoparticles of different sizes. Quantitative relationships between the protein corona and the activity of the gold nanoparticles were modelled by using several machine learning-based linear and non-linear approaches. Models based on a selection of only six serum proteins had robust and predictive results. The Projection Pursuit Regression method had the best performances (r(2) = 0.91; Q(2)loo = 0.81; r(2)ext = 0.79). The present study confirmed the utility of protein corona composition to predict the bioactivity of gold nanoparticles and identified the main proteins that act as promoters or inhibitors of cell association. In addition, the comparison of several techniques showed which strategies offer the best results in prediction and could be used to support new toxicological studies on gold-based nanomaterials.

show abstract

Section: Random Forestmentioning

confidence: 99%

Investigation of the influence of protein corona composition on gold nanoparticle bioactivity using machine learning approaches

Papa

Doucet

Sangion

et al. 2016

SAR and QSAR in Environmental Research

View full text Add to dashboard Cite

show abstract

“…A popular approach in the literature is to apply tools from machine learning on certain DFT calculations to accelerate prediction of various properties of compounds [16][17][18][19][20][21][22] . Ideas from machine learning have been coupled with databases of ab initio calculations to estimate molecular electronic properties in chemical compound space, including the enthalpy of formation of compounds 23,24 . However, these methods still have the major disadvantage of requiring results from many DFT calculations, which may not be possible for alloys without given crystal structures, i.e., amorphous or noncrystalline alloys.…”

Section: Introductionmentioning

confidence: 99%

Formation enthalpies for transition metal alloys using machine learning

et al. 2017

View full text Add to dashboard Cite

“…Random forest (RF) was used for feature selection. RF was a popular and efficient algorithm, based on model aggregation ideas, regardless of classification or regression problems37. RF was implemented by the component “Learn R Forest Model” in Pipeline Pilot 8.5, invoking the R package “RandomForest”.…”

Section: Methodsmentioning

confidence: 99%

Predicting Subtype Selectivity for Adenosine Receptor Ligands with Three-Dimensional Biologically Relevant Spectrum (BRS-3D)

Kuang

et al. 2016

Sci Rep

View full text Add to dashboard Cite

Adenosine receptors (ARs) are potential therapeutic targets for Parkinson’s disease, diabetes, pain, stroke and cancers. Prediction of subtype selectivity is therefore important from both therapeutic and mechanistic perspectives. In this paper, we introduced a shape similarity profile as molecular descriptor, namely three-dimensional biologically relevant spectrum (BRS-3D), for AR selectivity prediction. Pairwise regression and discrimination models were built with the support vector machine methods. The average determination coefficient (r2) of the regression models was 0.664 (for test sets). The 2B-3 (A2B vs A3) model performed best with q2 = 0.769 for training sets (10-fold cross-validation), and r2 = 0.766, RMSE = 0.828 for test sets. The models’ robustness and stability were validated with 100 times resampling and 500 times Y-randomization. We compared the performance of BRS-3D with 3D descriptors calculated by MOE. BRS-3D performed as good as, or better than, MOE 3D descriptors. The performances of the discrimination models were also encouraging, with average accuracy (ACC) 0.912 and MCC 0.792 (test set). The 2A-3 (A2A vs A3) selectivity discrimination model (ACC = 0.882 and MCC = 0.715 for test set) outperformed an earlier reported one (ACC = 0.784). These results demonstrated that, through multiple conformation encoding, BRS-3D can be used as an effective molecular descriptor for AR subtype selectivity prediction.

show abstract

Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

Cited by 61 publications

References 65 publications

Investigation of the influence of protein corona composition on gold nanoparticle bioactivity using machine learning approaches

Investigation of the influence of protein corona composition on gold nanoparticle bioactivity using machine learning approaches

Formation enthalpies for transition metal alloys using machine learning

Predicting Subtype Selectivity for Adenosine Receptor Ligands with Three-Dimensional Biologically Relevant Spectrum (BRS-3D)

Contact Info

Product

Resources

About