Deep learning has drawn significant attention in different areas including drug discovery. It has been proposed that it could outperform other machine learning algorithms, especially with big data sets. In the field of pharmaceutical industry, machine learning models are built to understand quantitative structure−activity relationships (QSARs) and predict molecular activities, including absorption, distribution, metabolism, and excretion (ADME) properties, using only molecular structures. Previous reports have demonstrated the advantages of using deep neural networks (DNNs) for QSAR modeling. One of the challenges while building DNN models is identifying the hyperparameters that lead to better generalization of the models. In this study, we investigated several tunable hyperparameters of deep neural network models on 24 industrial ADME data sets. We analyzed the sensitivity and influence of five different hyperparameters including the learning rate, weight decay for L2 regularization, dropout rate, activation function, and the use of batch normalization. This paper focuses on strategies and practices for DNN model building. Further, the optimized model for each data set was built and compared with the benchmark models used in production. Based on our benchmarking results, we propose several practices for building DNN QSAR models.
Could high-quality in silico predictions in drug discovery eventually replace part or most of experimental testing? To evaluate the agreement of selectivity data from different experimental or predictive sources, we introduce the new metric concordance minimum significant ratio (cMSR). Empowered by cMSR, we find the overall level of agreement between predicted and experimental data to be comparable to that found between experimental results from different sources. However, for molecules that are either highly selective or potent, the concordance between different experimental sources is significantly higher than the concordance between experimental and predicted values. We also show that computational models built from one data set are less predictive for other data sources and highlight the importance of bias correction for assessing selectivity data. Finally, we show that small-molecule target space relationships derived from different data sources and predictive models share overall similarity but can significantly differ in details.
For a response surface experiment, an approximate hypothesis test and an associated confidence region is proposed for the minimizing (or maximizing) factor-level configuration. Carter et al. (1982, Cancer Research 42, 2963-2971) show that confidence regions for optimal conditions provide a way to make decisions about therapeutic synergism. The response surface may be constrained to be within a specified, bounded region. These constraint regions can be quite general. This allows for more realistic constraint modeling and a wide degree of applicability, including constraints occurring in mixture experiments. The usual assumption of a quadratic model is also generalized to include any regression model that is linear in the model parameters. An intimate connection is established between this confidence region and the Box-Hunter (1954, Biometrika 41, 190-199) confidence region for a stationary point. As a byproduct, this methodology also provides a way to construct a confidence interval for the difference between the optimal mean response and the mean response at a specified factor-level configuration. The application of this confidence region is illustrated with two examples. Extensive simulations indicate that this confidence region has good coverage properties.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.