Alongside the validation, the concept of applicability domain (AD) is probably one of the most important aspects which determine the quality as well as reliability of the established quantitative structure-activity relationship (QSAR) models. To date, a variety of approaches for AD estimation have been devised which can be applied to particular type of QSAR models and their practical utilization is extensively elaborated in the literature. The present study introduces a novel, simple, and effective distance-based method for estimation of the AD in case of developed and validated predictive counter-propagation artificial neural network (CP ANN) models through a proficient exploitation of the euclidean distance (ED) metric in the structure-representation vector space. The performance of the method was evaluated and explained in a case study by using a pre-built and validated CP ANN model for prediction of the transport activity of the transmembrane protein bilitranslocase for a diverse set of compounds. The method was tested on two more datasets in order to confirm its performance for evaluation of the applicability domain in CP ANN models. The chemical compounds determined as potential outliers, i.e., outside of the CP ANN model AD, were confirmed in a comparative AD assessment by using the leverage approach. Moreover, the method offers a graphical depiction of the AD for fast and simple determination of the extreme points.
Drug-induced liver injury is a major concern in the drug development process. Expensive and time-consuming in vitro and in vivo studies do not reflect the complexity of the phenomenon. Complementary to wet lab methods are in silico approaches, which present a cost-efficient method for toxicity prediction. The aim of our study was to explore the capabilities of counter-propagation artificial neural networks (CPANNs) for the classification of an imbalanced dataset related to idiosyncratic drug-induced liver injury and to develop a model for prediction of the hepatotoxic potential of drugs. Genetic algorithm optimization of CPANN models was used to build models for the classification of drugs into hepatotoxic and non-hepatotoxic class using molecular descriptors. For the classification of an imbalanced dataset, we modified the classical CPANN training algorithm by integrating random subsampling into the training procedure of CPANN to improve the classification ability of CPANN. According to the number of models accepted by internal validation and according to the prediction statistics on the external set, we concluded that using an imbalanced set with balanced subsampling in each learning epoch is a better approach compared to using a fixed balanced set in the case of the counter-propagation artificial neural network learning methodology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.