Structure-activity relationship (SAR) models are recognized as powerful tools to predict the toxicologic potential of new or untested chemicals and also provide insight into possible mechanisms of toxicity. Models have been based on physicochemical attributes and structural features of chemicals. We describe herein the development of a new SAR modeling algorithm called cat-SAR that is capable of analyzing and predicting chemical activity from divergent biological response data. The cat-SAR program develops chemical fragment-based SAR models from categorical biological response data (e.g. toxicologically active and inactive compounds). The database selected for model development was a published set of chemicals documented to cause respiratory hypersensitivity in humans. Two models were generated that differed only in that one model included explicate hydrogen containing fragments. The predictive abilities of the models were tested using leave-one-out cross-validation tests. One model had a sensitivity of 0.94 and specificity of 0.87 yielding an overall correct prediction of 91%. The second model had a sensitivity of 0.89, specificity of 0.95 and overall correct prediction of 92%. The demonstrated predictive capabilities of the cat-SAR approach, together with its modeling flexibility and design transparency, suggest the potential for its widespread applicability to toxicity prediction and for deriving mechanistic insight into toxicologic effects.
Structure-activity relationship (SAR) models are powerful tools to investigate the mechanisms of action of chemical carcinogens and to predict the potential carcinogenicity of untested compounds. We describe here the application of the cat-SAR (categorical-SAR) program to two learning sets of rat mammary carcinogens. One set of developed models was based on a comparison of rat mammary carcinogens to rat noncarcinogens (MC-NC), and the second set compared rat mammary carcinogens to rat nonmammary carcinogens (MC-NMC). On the basis of a leave-one-out validation, the best rat MC-NC model achieved a concordance between experimental and predicted values of 84%, a sensitivity of 79%, and a specificity of 89%. Likewise, the best rat MC-MNC model achieved a concordance of 78%, a sensitivity of 82%, and a specificity of 74%. The MC-NMC model was based on a learning set that contained carcinogens in both the active (i.e., mammary carcinogens) and the inactive (i.e., carcinogens to sites other than the mammary gland) categories and was able to distinguish between these different types of carcinogens (i.e., tissue specific), not simply between carcinogens and noncarcinogens. On the basis of a structural comparison between this model and one for Salmonella mutagens, there was, as expected, a significant relationship between the two phenomena since a high proportion of breast carcinogens are Salmonella mutagens. However, when analyzing the specific structural features derived from the MC-NC learning set, a dichotomy was observed between fragments associated with mammary carcinogenesis and mutagenicity and others that were associated with estrogenic activity. Overall, these findings suggest that the MC-NC and MC-NMC models are able to identify structural attributes that may in part address the question of "why do some carcinogens cause breast cancer", which is a different question than "why do some chemicals cause cancer".
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.