Neural networks were widely used for quantitative structure-activity relationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow on large problems, difficult to train, prone to overfitting, etc.), they were superseded by more robust methods like support vector machine (SVM) and random forest (RF), which arose in the early 2000s. The last 10 years has witnessed a revival of neural networks in the machine learning community thanks to new methods for preventing overfitting, more efficient training algorithms, and advancements in computer hardware. In particular, deep neural nets (DNNs), i.e. neural nets with more than one hidden layer, have found great successes in many applications, such as computer vision and natural language processing. Here we show that DNNs can routinely make better prospective predictions than RF on a set of large diverse QSAR data sets that are taken from Merck's drug discovery effort. The number of adjustable parameters needed for DNNs is fairly large, but our results show that it is not necessary to optimize them for individual data sets, and a single set of recommended parameters can achieve better performance than RF for most of the data sets we studied. The usefulness of the parameters is demonstrated on additional data sets not used in the calibration. Although training DNNs is still computationally intensive, using graphical processing units (GPUs) can make this issue manageable.
209 Background: Immunotherapy approaches, including immune checkpoint blockade, have shown initial promising results in HCC. Anti PD-1 therapy with pembrolizumab has demonstrated antitumor activity and manageable safety in multiple cancers. KEYNOTE-224 (NCT02702414), an open label, phase 2 trial assessed the efficacy and safety of pembrolizumab in pts with advanced HCC previously treated with sorafenib. Methods: Eligible pts were age ≥18 y with confirmed HCC, radiographic progression after sorafenib and disease not amenable to curative treatment, Child Pugh A, ECOG PS 0-1 and predicted life expectancy > 3 mo. Pts received pembrolizumab 200 mg Q3W for 2 y or until disease progression, unacceptable toxicity, withdrawal of consent or investigator decision. Response was assessed every 9 wk (RECIST v1.1, central review). Primary endpoint was ORR (RECIST v1.1, central review). Secondary endpoints included DOR, DCR, PFS, OS, and safety and tolerability. Data cutoff date was Aug 24, 2017. Results: Of 104 treated pts, 23 continued therapy (median follow up 8.4 mo, range 0.4-13.6). Median age of pts was 68y (range 43-87), 21.2% were HBV+, 26% were HCV+, 94.2% were Child Pugh A, 79.8% had PD on sorafenib and 63.5% had extrahepatic disease. ORR was 16.3% (95% CI, 9.8 to 24.9) and similar across subgroups with different etiology. Median time to response was 2.1 mo (range 1.8-4.8) and 94% of responders were estimated to have a response duration ≥6 mo. Best responses were CR in 1 patient (1.0%), PR in 16 (15.4%), SD in 47 (45.2%) and PD in 34 (32.7%); DCR was 61.5%. Median PFS was 4.8 mo (95% CI, 3.4 to 6.6) and median OS (9.4 to NA) was not reached. The 6 mo PFS and OS rates were 43.1% and 77.9%, respectively. Treatment related (TR) AE occurred in 73.1% of pts; fatigue (21.2%) and increased aspartate aminotransferase (12.5%) were seen in ≥10% of pts and grades 3-5 TRAE in 25% including 1 death (ulcerative esophagitis). No cases of HBV/HCV flare occurred; immune mediated hepatitis occurred in 3 (2.9%) pts. Conclusion: Pembrolizumab treatment resulted in durable responses and favorable PFS and OS in pts with advanced HCC previously treated with sorafenib. Safety was generally comparable to that established for pembrolizumab monotherapy. Clinical trial information: NCT02702414.
In the pharmaceutical industry it is common to generate many QSAR models from training sets containing a large number of molecules and a large number of descriptors. The best QSAR methods are those that can generate the most accurate predictions but that are not overly expensive computationally. In this paper we compare eXtreme Gradient Boosting (XGBoost) to random forest and single-task deep neural nets on 30 in-house data sets. While XGBoost has many adjustable parameters, we can define a set of standard parameters at which XGBoost makes predictions, on the average, better than those of random forest and almost as good as those of deep neural nets. The biggest strength of XGBoost is its speed. Whereas efficient use of random forest requires generating each tree in parallel on a cluster, and deep neural nets are usually run on GPUs, XGBoost can be run on a single CPU in less than a third of the wall-clock time of either of the other methods.
Deep neural networks (DNNs) are complex computational models that have found great success in many artificial intelligence applications, such as computer vision1,2 and natural language processing.3,4 In the past four years, DNNs have also generated promising results for quantitative structure-activity relationship (QSAR) tasks.5,6 Previous work showed that DNNs can routinely make better predictions than traditional methods, such as random forests, on a diverse collection of QSAR data sets. It was also found that multitask DNN models-those trained on and predicting multiple QSAR properties simultaneously-outperform DNNs trained separately on the individual data sets in many, but not all, tasks. To date there has been no satisfactory explanation of why the QSAR of one task embedded in a multitask DNN can borrow information from other unrelated QSAR tasks. Thus, using multitask DNNs in a way that consistently provides a predictive advantage becomes a challenge. In this work, we explored why multitask DNNs make a difference in predictive performance. Our results show that during prediction a multitask DNN does borrow "signal" from molecules with similar structures in the training sets of the other tasks. However, whether this borrowing leads to better or worse predictive performance depends on whether the activities are correlated. On the basis of this, we have developed a strategy to use multitask DNNs that incorporate prior domain knowledge to select training sets with correlated activities, and we demonstrate its effectiveness on several examples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.