Variable Selection in QSAR Models for Drug Design

Tsygankova, I. G.

doi:10.2174/157340908784533238

Cited by 17 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Feature selection techniques have been successfully applied in many real-world applications, such as large-scale biological data analysis [ [24] , [25] , [26] ], text classification [ 27 ], information retrieval [ 28 ], near-infrared spectroscopy [ 29 ], mass spectroscopy data analysis [ 30 ], drug design [ 31 , 32 ], and especially the quantitative structure-activity relationship (QSAR) modeling [ 33 , 34 ]. In cancer research community, feature selection has also been widely applied in different omics data analyses: mRNA data [ 9 , 35 ], miRNA data [ 36 , 37 ], whole exome sequencing data [ 38 ], DNA-methylation data [ 39 , 40 ], and proteomics data [ 41 , 42 ].…”

Section: Feature Selection Techniquesmentioning

confidence: 99%

A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis

Liang

Yang

et al. 2018

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

With the rapid accumulation of gene expression data from various technologies, e.g., microarray, RNA-sequencing (RNA-seq), and single-cell RNA-seq, it is necessary to carry out dimensional reduction and feature (signature genes) selection in support of making sense out of such high dimensional data. These computational methods significantly facilitate further data analysis and interpretation, such as gene function enrichment analysis, cancer biomarker detection, and drug targeting identification in precision medicine. Although numerous methods have been developed for feature selection in bioinformatics, it is still a challenge to choose the appropriate methods for a specific problem and seek for the most reasonable ranking features. Meanwhile, the paired gene expression data under matched case-control design (MCCD) is becoming increasingly popular, which has often been used in multi-omics integration studies and may increase feature selection efficiency by offsetting similar distributions of confounding features. The appropriate feature selection methods specifically designed for the paired data, which is named as matched-pairs feature selection (MPFS), however, have not been maturely developed in parallel. In this review, we compare the performance of 10 feature-selection methods (eight MPFS methods and two traditional unpaired methods) on two real datasets by applied three classification methods, and analyze the algorithm complexity of these methods through the running of their programs. This review aims to induce and comprehensively present the MPFS in such a way that readers can easily understand its characteristics and get a clue in selecting the appropriate methods for their analyses.

show abstract

Section: Feature Selection Techniquesmentioning

confidence: 99%

A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis

Liang

Yang

et al. 2018

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

show abstract

“…A high value of the statistical feature (R 2 N 0.5) in the crossvalidations is considered proof of the high predictive ability of a model. Within the data analysis stage, the partial least squares (PLS), the multivariate linear regression (MLR), and the artificial neural network (ANN) are the techniques used for the selection of a subset of the most relevant molecular descriptors [24].…”

Section: Introductionmentioning

confidence: 99%

Application of k-means clustering, linear discriminant analysis and multivariate linear regression for the development of a predictive QSAR model on 5-lipoxygenase inhibitors

Andrada

Vega-Hissi

Estrada

et al. 2015

Chemometrics and Intelligent Laboratory Systems

View full text Add to dashboard Cite

“…Assessment of descriptor importance with respect to the response variable is well established within the QSAR community as a method to provide mechanistic insight. 2 ML methods such as partial least-squares, 3 random forest, 4 and artificial neural networks, 5 as well as entropy (Gini index) and near neighbor based (ReliefF) methods, are used to rank the descriptor importance for the full data set. However, such global ranking of descriptors in a structurally diverse data set might not be accurate for individual compounds predicted by a nonlinear ML algorithm, and ultimately the structure activity relationship (SAR) should be understood locally for each scaffold.…”

Section: ■ Introductionmentioning

confidence: 99%

Localized Heuristic Inverse Quantitative Structure Activity Relationship with Bulk Descriptors Using Numerical Gradients

Stålring

Almeida

Carlsson

et al. 2013

J. Chem. Inf. Model.

View full text Add to dashboard Cite

State-of-the-art quantitative structure-activity relationship (QSAR) models are often based on nonlinear machine learning algorithms, which are difficult to interpret. From a pharmaceutical perspective, QSARs are used to enhance the chemical design process. Ultimately, they should not only provide a prediction but also contribute to a mechanistic understanding and guide modifications to the chemical structure, promoting compounds with desirable biological activity profiles. Global ranking of descriptor importance and inverse QSAR have been used for these purposes. This paper introduces localized heuristic inverse QSAR, which provides an assessment of the relative ability of the descriptors to influence the biological response in an area localized around the predicted compound. The method is based on numerical gradients with parameters optimized using data sets sampled from analytical functions. The heuristic character of the method reduces the computational requirements and makes it applicable not only to fragment based methods but also to QSARs based on bulk descriptors. The application of the method is illustrated on congeneric QSAR data sets, and it is shown that the predicted influential descriptors can be used to guide structural modifications that affect the biological response in the desired direction. The method is implemented into the AZOrange Open Source QSAR package. The current implementation of localized heuristic inverse QSAR is a step toward a generally applicable method for elucidating the structure activity relationship specifically for a congeneric region of chemical space when using QSARs based on bulk properties. Consequently, this method could contribute to accelerating the chemical design process in pharmaceutical projects, as well as provide information that could enhance the mechanistic understanding for individual scaffolds.

show abstract

Variable Selection in QSAR Models for Drug Design

Cited by 17 publications

References 0 publications

A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis

A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis

Application of k-means clustering, linear discriminant analysis and multivariate linear regression for the development of a predictive QSAR model on 5-lipoxygenase inhibitors

Localized Heuristic Inverse Quantitative Structure Activity Relationship with Bulk Descriptors Using Numerical Gradients

Contact Info

Product

Resources

About