2014
DOI: 10.1007/s00726-014-1667-5
|View full text |Cite
|
Sign up to set email alerts
|

A pipeline for improved QSAR analysis of peptides: physiochemical property parameter selection via BMSF, near-neighbor sample selection via semivariogram, and weighted SVR regression and prediction

Abstract: Received: / Accepted: / Published:Abstract: In this paper, we present a pipeline to perform improved QSAR analysis of peptides. The modeling involves a double selection procedure that first performs feature selection and then conducts sample selection before the final regression analysis. Five hundred and thirty-one physicochemical property parameters of amino acids were used as descriptors to characterize the structure of peptides. These high-dimensional descriptors then go through a feature selection process… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 32 publications
0
8
0
Order By: Relevance
“…Therefore, the significant improvement in model performance was achieved by feature selection because plenty of irrelevant features were eliminated. 17 …”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Therefore, the significant improvement in model performance was achieved by feature selection because plenty of irrelevant features were eliminated. 17 …”
Section: Discussionmentioning
confidence: 99%
“…comprising the peptides and properties of the entire peptides (electronegativity, sequence information, solubility, molecular weight, topological information, etc.). 17 , 18 Then, feature selection and modeling methods are combined to connect the structure information and bioactivity. 17 , 19 More than 80 amino acid descriptors (AADs) extracted from properties of amino acids by principal component analysis (PCA) were presented to characterize peptide structures and encode the peptides.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…This study uses the molecule descriptor calculation software, PCLIENT, to calculate thousands of physiochemical parameters for every small molecule compound of alcohol. 15 Optimum descriptors subset is obtained by a feature selection pipeline containing three step searching strategies: (i) select statistically significant features that imply nonlinear correlation with biotoxicity of chemical compounds using MIC based univariate filter; (ii) refine feature subset by support vector regression based backward elimination (SVR-BE); 16 (iii) obtain optimal subset via a forward selection process that integrated minimal redundancy maximal relevance, MIC and SVR. A QSAR model is finally built on the training set with the reserved descriptors, and then to predict biotoxicities of Rana temporaria in the test set.…”
Section: Introductionmentioning
confidence: 99%