2015
DOI: 10.1186/s12859-015-0633-x
|View full text |Cite
|
Sign up to set email alerts
|

PredSTP: a highly accurate SVM based model to predict sequential cystine stabilized peptides

Abstract: BackgroundNumerous organisms have evolved a wide range of toxic peptides for self-defense and predation. Their effective interstitial and macro-environmental use requires energetic and structural stability. One successful group of these peptides includes a tri-disulfide domain arrangement that offers toxicity and high stability. Sequential tri-disulfide connectivity variants create highly compact disulfide folds capable of withstanding a variety of environmental stresses. Their combination of toxicity and stab… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
22
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
2

Relationship

3
5

Authors

Journals

citations
Cited by 25 publications
(23 citation statements)
references
References 59 publications
1
22
0
Order By: Relevance
“…To cope with the daily increase of the number of sequence in the UniProt database, future efforts will be put in the full automation of the update pipeline. Regarding the latest data, it is interesting to observed that, while sequences have been found in animals, plants, fungi, bacteria and viruses, knottins are still absent from archaea, which converges with previous findings ( 13 ). Therefore, particular attention will be paid to new data about these organisms, which represent one of the three domains of life.…”
Section: Discussionsupporting
confidence: 87%
“…To cope with the daily increase of the number of sequence in the UniProt database, future efforts will be put in the full automation of the update pipeline. Regarding the latest data, it is interesting to observed that, while sequences have been found in animals, plants, fungi, bacteria and viruses, knottins are still absent from archaea, which converges with previous findings ( 13 ). Therefore, particular attention will be paid to new data about these organisms, which represent one of the three domains of life.…”
Section: Discussionsupporting
confidence: 87%
“…We used a publicly available AMP database to access AMP sequences and metadata, but first applied filters to narrow the pool to those peptides of greatest practical value for heterologous expression in plants. Since we were most interested in peptides possessing the highly stable sequential tri-disulfide peptide (STP) structure, we used our PredSTP algorithm (Islam et al, 2015) to narrow the pool of AMPs gathered from the AMP database to only STPs. We further narrowed the pool to only peptides 30-50 amino acids in length and eliminated redundant sequences (80% sequence similarity cutoff), resulting in a final data set of 96 STP-AMPs of plant origin and 58 STP-AMPs of non-plant origin (Supplemental File 1).…”
Section: Resultsmentioning
confidence: 99%
“…We have added to the capacity of this developmental pipeline by developing several algorithms for detecting AMPs from genomic databases. We can now predict sequential tri-disulfide peptides (STPs) from genomes using a support vector machine algorithm (Islam et al , 2015). STPs are the predominant structural form of AMPs and are study, robust structures (Islam et al , 2018b).…”
Section: Discussionmentioning
confidence: 99%
“…Sequence classification using supervised and unsupervised machine learning methods is becoming popular due to algorithm accessibility in conjunction with increasing amounts of available biological data. Recent work in this area includes the classification of protein structure (Islam et al, 2015), localization (Yu and Hwang, 2008), function (Cai et al, 2003), family (Chou, 2005) and protein-protein interaction (PPI) (Zhao et al, 2012;Yu and Hwang, 2008) based on primary sequence. These studies consistently report that ML approaches are superior to alignment based predictions when deriving protein characteristics from primary sequence, and perform effectively in protein groups with low sequence similarity.…”
Section: Introductionmentioning
confidence: 99%
“…While universal methods for feature extraction are problematic due to the wide range of classification strategies, several generalized feature generation methods have been proposed. Many of these methods aim to address specific classification problems (Islam et al, 2015;Bock and Gough, 2001;Dyrlv Bendtsen et al, 2004), while others may be implemented as semiautomated feature generators. For example, amino acid composition (Verma and Melcher, 2012) and pseudo-amino acid composition (Du et al, 2014) based feature extraction schemes have been successfully used to solve a range of classification problems (Garg et al, 2005;Xu et al, 2013;Qiu et al, 2016;Tiwari, 2016).…”
Section: Introductionmentioning
confidence: 99%