2021
DOI: 10.1093/bib/bbab248
|View full text |Cite
|
Sign up to set email alerts
|

Improving protein fold recognition using triplet network and ensemble deep learning

Abstract: Protein fold recognition is a critical step toward protein structure and function prediction, aiming at providing the most likely fold type of the query protein. In recent years, the development of deep learning (DL) technique has led to massive advances in this important field, and accordingly, the sensitivity of protein fold recognition has been dramatically improved. Most DL-based methods take an intermediate bottleneck layer as the feature representation of proteins with new fold types. However, this strat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(20 citation statements)
references
References 54 publications
0
20
0
Order By: Relevance
“…In addition, in order to compare with DeepFRpro [38], CNN-BGRU-RF+ [44], FoldTRpro [43], FSD_XGBoostpro [43], and FoldHSphere [45], which apply a random forest (RF) ensemble, we implemented and tested the same RF strategy in our ensemble approach (see Materials and Methods section). It must be noted that these results cannot be directly compared to the previous ones, as this approach involves additional training of the RF models on the test set in a 10-stage cross-testing manner.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition, in order to compare with DeepFRpro [38], CNN-BGRU-RF+ [44], FoldTRpro [43], FSD_XGBoostpro [43], and FoldHSphere [45], which apply a random forest (RF) ensemble, we implemented and tested the same RF strategy in our ensemble approach (see Materials and Methods section). It must be noted that these results cannot be directly compared to the previous ones, as this approach involves additional training of the RF models on the test set in a 10-stage cross-testing manner.…”
Section: Resultsmentioning
confidence: 99%
“…Finally, we compare the performance of our best individual models, as well as the ensemble strategy we propose, against the state-of-the-art results for fold recognition and fold classification. First, we compare to several methods intended for the PFR task, which can be grouped into three categories: (i) alignment and threading methods such as PSI-BLAST [104], HHpred [20], RAPTOR [23], BoostThreader [22], SPARKS-X [24], MRFalign [21], and CEthreader [28]; (ii) machine learning methods such as FOLDpro [29], RF-Fold [31], DN-Fold [31], RFDN-Fold [31]; and (iii) deep learning methods such as DeepFR [38], CNN-BGRU [44] VGGfold [42], FoldTR [43], and FoldHSphere [45]. Table 4 shows the PFR accuracy results achieved by these methods on the LINDAHL test set, as well as the best performing model ProtT5 + LAT and the average ensemble.…”
Section: Comparison With State-of-the-art Methods For Fold Recognitio...mentioning
confidence: 99%
See 1 more Smart Citation
“…Ensemble learning is used in many protein tasks and has good performance, such as recognition of multiple lysine PTM sites and the different types of these sites ( Qiu et al, 2016a ), recognition of phosphorylation sites in proteins ( Qiu et al, 2016b ) and recognition of protein folds ( Liu et al, 2021 ). The ensemble model usually has better performance than individual predictors.…”
Section: Methodsmentioning
confidence: 99%
“…In recent years, many protein fold recognition methods have been developed. These methods can be generally classified into two major categories: , (1) template alignment methods, which calculate the similarity between query protein and template protein based on sequence–sequence similarity or sequence–structure similarity; and (2) machine learning methods, which treat the matching of the query protein and template protein as a binary classification problem. Regardless of the strategy used, extracting representative features from protein sequences is the key to improving the accuracy of protein fold identification.…”
Section: Introductionmentioning
confidence: 99%