Read-across is a popular data gap filling technique used within analogue and category approaches for regulatory purposes. In recent years there have been many efforts focused on the challenges involved in read-across development, its scientific justification and documentation. Tools have also been developed to facilitate read-across development and application. Here, we describe a number of publicly available read-across tools in the context of the category/analogue workflow and review their respective capabilities, strengths and weaknesses. No single tool addresses all aspects of the workflow. We highlight how the different tools complement each other and some of the opportunities for their further development to address the continued evolution of read-across.
Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. This feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-016-0164-0) contains supplementary material, which is available to authorized users.
The application of toxic equivalency factors (TEFs) or toxic units to estimate toxic potencies for mixtures of chemicals which contribute to a biological effect through a common mechanism is one approach for filling data gaps. Toxic Equivalents (TEQ) have been used to express the toxicity of dioxin-like compounds (i.e., dioxins, furans, and dioxin-like polychlorinated biphenyls (PCBs)) in terms of the most toxic form of dioxin: 2,3,7,8-tetrachlorodibenzo-p-dioxin (2,3,7,8-TCDD). This study sought to integrate two data gap filling techniques, quantitative structure-activity relationships (QSARs) and TEFs, to predict neurotoxicity TEQs for PCBs. Simon et al. (2007) previously derived neurotoxic equivalent (NEQ) values for a dataset of 87 PCB congeners, of which 83 congeners had experimental data. These data were taken from a set of four different studies measuring different effects related to neurotoxicity, each of which tested overlapping subsets of the 83 PCB congeners. The goals of the current study were to: (i) evaluate an alternative neurotoxic equivalent factor (NEF) derivations from an expanded dataset, relative to those derived by Simon et al., and (ii) develop QSAR models to provide NEF estimates for the large number of untested PCB congeners. The models used multiple linear regression, support vector regression, knearest neighbor and random forest algorithms within a 5-fold cross validation scheme. and position-specific chlorine substitution patterns on the biphenyl scaffold as descriptors. Alternative NEF values were derived but the resulting QSAR models had relatively low predictivity (RMSE ~0.24). This was mostly driven by the large uncertainties in the underlying data and NEF values. The derived NEFs and the QSAR predicted NEFs to fill data gaps should be applied with caution.
The toxicokinetic (TK) parameters fraction of the chemical unbound to plasma proteins and metabolic clearance are critical for relating exposure and internal dose when building in vitrobased risk assessment models. However, experimental toxicokinetic studies have only been carried out on limited chemicals of environmental interest (~1000 chemicals with TK data relative to tens of thousands of chemicals of interest). This work evaluated the utility of chemical structure information to predict TK parameters in silico; development of cluster-based read-across and quantitative structure-activity relationship models of fraction unbound or fub (regression) and intrinsic clearance or Cl int (classification and regression) using a dataset of 1487 chemicals; utilization of predicted TK parameters to estimate uncertainty in steady-state plasma concentration (C ss ); and subsequent in vitro-in vivo extrapolation analyses to derive bioactivity-exposure ratio (BER) plot to compare human oral equivalent doses and exposure predictions using androgen and estrogen receptor activity data for 233 chemicals as an example dataset. The results demonstrate that fub is structurally more predictable than Cl int . The model with the highest observed performance for fub had an external test set RMSE/σ=0.62 and R 2 =0.61, for Cl int classification had an external test set accuracy = 65.9%, and for intrinsic clearance regression had an external test set RMSE/σ=0.90 and R 2 =0.20. This relatively low performance is in part due to the large uncertainty in the underlying Cl int data. We show that C ss is relatively insensitive to uncertainty in Cl int . The models were benchmarked against the ADMET Predictor software. Finally, the BER analysis allowed identification of 14 out of 136 chemicals for further risk assessment demonstrating the utility of these models in aiding risk-based chemical prioritization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.