1979
DOI: 10.1021/jm00196a017
|View full text |Cite
|
Sign up to set email alerts
|

Chance factors in studies of quantitative structure-activity relationships

Abstract: Multiple regression analysis is a basic statistical tool used for QSAR studies in drug design. However, there is a risk or arriving at fortuitous correlations when too many variables are screened relative to the number of available observations. In this regard, a critical distinction must be made between the number of variables screened for possible correlation and the number which actually appear in the regression equation. Using a modified Fortran stepwise multiple-regression analysis program, simulated QSAR… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
391
0
3

Year Published

1990
1990
2010
2010

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 665 publications
(406 citation statements)
references
References 4 publications
(4 reference statements)
5
391
0
3
Order By: Relevance
“…This is called the Topliss ratio. 16 As a rule-of-thumb, it is recommended that the Topliss ratio should have a value of at least 5. The structure of a QSAR model should always be inspected by validation techniques, able to detect over fitting due to variable multicollinearity, noise, sample specificity, and unjustified model complexity.…”
Section: Measure Of the Validity Of The Modelmentioning
confidence: 99%
“…This is called the Topliss ratio. 16 As a rule-of-thumb, it is recommended that the Topliss ratio should have a value of at least 5. The structure of a QSAR model should always be inspected by validation techniques, able to detect over fitting due to variable multicollinearity, noise, sample specificity, and unjustified model complexity.…”
Section: Measure Of the Validity Of The Modelmentioning
confidence: 99%
“…As this structure might be associated with 'chance' correlations, 20 we inverted the respective roles of genes and normal/leukemic promyelocytes (statistical units and variables, respectively) and the resulting matrix was submitted to PCA. This method uses an unsupervised learning approach to investigate natural co-expression structures (components) of the entire genome.…”
Section: Principal Component Analysismentioning
confidence: 99%
“…Cell sugars are therefore preferable to fatty acids as chemotaxonomic parameters for the Actinobacillus-Haemophilus-Pasteurella group. For a data-analytical problem such as that presented here, where discrimination between classes is of interest, but where the number of variables are approximately 2-3 times the number of samples in the major classes, the risk for chance correlations are obvious if one variable at a time is studied, or if methods based on multiple regression are used (Topliss & Edwards, 1979). One cannot, therefore, by directly screening one variable at a time, conclude that the D-glycero-D-mannoheptose variable is a 'true' indicator variable despite the fact that it perfectly distinguishes the A. actinomycetemcomitans and H .…”
Section: Discussionmentioning
confidence: 99%