Most quantitative structure-activity relationship (QSAR) models are linear relationships and significant for only a limited domain of compounds. Here we propose a data-driven approach with a flexible combination of unsupervised and supervised neural networks able to predict the toxicity of a large set of different chemicals while still respecting the QSAR postulates. Since QSAR is applicable only to similar compounds, which have similar biological and physicochemical properties, large numbers of compounds are clustered before building local models, and local models are ensembled to obtain the final result. The approach has been used to develop models to predict the fish toxicity of Pimephales promelas and Tetrahymena pyriformis, a protozoan.
In many real-world applications simple classifiers are too weak to have predictive power. Ensemble techniques, or mixture of experts, are a possible solution. We illustrate why mixture of experts are a natural choice in domains such as the prediction of environmental toxicity for chemicals, when a structural approach is pursued. The real data here used are derived from peer reviewed experiments, and are publicly available, but are difficult to model. We used them to predict aquatic toxicity for fish. Chemical information was coded into a set of about 160 descriptors; after reducing the dimensions of the feature vector through different techniques, we developed multivariate regression to build a model of the toxic effects of chemicals. Defining toxicity as a category, as in European Union (EU) regulations, we extended the study to predict toxicity class. Problems with poor predictive power of this simple approach have led us to reconsider the problem from a more theoretical angle. We have respected locality criterion to build different local classifiers, one for each chemical class, to achieve better results. Then we combined the classifiers to get a complete system to predict any chemical for the chemical classes studied.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.