2004
DOI: 10.1002/chin.200439208
|View full text |Cite
|
Sign up to set email alerts
|

Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure—Activity Relationships of Noncongeneric Compounds.

Abstract: This paper explores the utility of data mining and machine learning algorithms for the induction of mutagenicity structure-activity relationships (SARs) from noncongeneric data sets. We compare (i) a newly developed algorithm (MOLFEA) for the generation of descriptors (molecular fragments) for noncongeneric compounds with traditional SAR approaches (molecular properties) and (ii) different machine learning algorithms for the induction of SARs from these descriptors. In addition we investigate the optimal param… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
100
0

Year Published

2007
2007
2017
2017

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 70 publications
(101 citation statements)
references
References 9 publications
1
100
0
Order By: Relevance
“…A recent comprehensive review [232] of different in silico models and approaches for predictions of genotoxic outcome, shows that most of the earlier approaches described for the prediction of Ames mutagenicity produced good specificity and sensitivity values (prediction accuracy of up to 85%). Depending on the descriptors and the statistical methods used, some of the models offer simple SAR information [82,244], whilst others are harder to interpret due to the choice of chemical descriptors derived from structural information [245,246].…”
Section: Modeling Studiesmentioning
confidence: 99%
See 1 more Smart Citation
“…A recent comprehensive review [232] of different in silico models and approaches for predictions of genotoxic outcome, shows that most of the earlier approaches described for the prediction of Ames mutagenicity produced good specificity and sensitivity values (prediction accuracy of up to 85%). Depending on the descriptors and the statistical methods used, some of the models offer simple SAR information [82,244], whilst others are harder to interpret due to the choice of chemical descriptors derived from structural information [245,246].…”
Section: Modeling Studiesmentioning
confidence: 99%
“…Different QSAR and machine learning methods have been used to derive in silico predictions about the Ames outcome of the chemicals. These include Ames test QSAR models using PLS, NN, RF, and SVM [46,[244][245][246][247][248][249][250].…”
Section: Modeling Studiesmentioning
confidence: 99%
“…In these scientific applications, graph patterns can help build classification model for better predicting unknown graphs between different classes and understanding these complex structures. For example, in chemical compounds data analysis, graph patterns can reveal that which substructures are the characteristics of chemical toxicity [6].…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, each graph is represented as a binary vector of pattern indicators (Figure 1). Graph mining is especially popular in chemoinformatics, where the task is to classify chemical compounds [11,7]. When all possible subgraphs are used, the dimensionality of the feature space is too large for usual statistical methods.…”
Section: Introductionmentioning
confidence: 99%
“…A naive approach is to use a frequent substructure mining algorithm such as AGM [9], gSpan [36] or Gaston [20] to collect frequently appearing patterns. This approach was employed by [7] and [11], where a linear support vector machine is used for classification. A more advanced approach is to mine informative patterns with high correlation to the output variable [19,1].…”
Section: Introductionmentioning
confidence: 99%