2023
DOI: 10.3390/app13052858
|View full text |Cite
|
Sign up to set email alerts
|

Identification of Thermophilic Proteins Based on Sequence-Based Bidirectional Representations from Transformer-Embedding Features

Abstract: Thermophilic proteins have great potential to be utilized as biocatalysts in biotechnology. Machine learning algorithms are gaining increasing use in identifying such enzymes, reducing or even eliminating the need for experimental studies. While most previously used machine learning methods were based on manually designed features, we developed BertThermo, a model using Bidirectional Encoder Representations from Transformers (BERT), as an automatic feature extraction tool. This method combines a variety of mac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 60 publications
0
8
0
Order By: Relevance
“…To compare TemStaPro with modern deep learning-based methods, it was tested on the iThermo ( Ahmed et al 2022 ) test dataset. The DeepTP ( Zhao et al 2023 ) and BertThermo ( Pei et al 2023 ) performance results were taken from the BertThermo publication. Among BertThermo, DeepTP, and iThermo, BertThermo achieved the best evaluation classification scores.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…To compare TemStaPro with modern deep learning-based methods, it was tested on the iThermo ( Ahmed et al 2022 ) test dataset. The DeepTP ( Zhao et al 2023 ) and BertThermo ( Pei et al 2023 ) performance results were taken from the BertThermo publication. Among BertThermo, DeepTP, and iThermo, BertThermo achieved the best evaluation classification scores.…”
Section: Resultsmentioning
confidence: 99%
“…It was only a matter of time before pLM embeddings were also applied for the identification of thermostable proteins. To the best of our knowledge, BertThermo ( Pei et al 2023 ) was the first such method that was published. However, BertThermo was trained using only 2803 sequences and was based on a binary classifier for only a single temperature threshold.…”
Section: Introductionmentioning
confidence: 99%
“…It integrates various deep learning methods into a self-contained protein T m predictor. In contrast to other predictors like SCooP, DeepTP, and BertThermo essentially classifying thermostable and non-thermostable proteins, Prostab2 and DeepSTABp both employ a regression model to predict continuous T m values [ 13 , 15 , 16 , 17 , 18 ]. Through proper sampling techniques, we trained the model for extensive prediction tasks and achieved superior results compared to the current state-of-the-art method, ProTstab2, by reducing the mean average prediction error by around 35 percent ( Table 1 , MAE).…”
Section: Discussionmentioning
confidence: 99%
“…Further, available algorithms for thermal stability prediction based on cell-wide analysis of protein stability TPP experiment differ regarding the definition of the learning problem. DeepTP [ 17 ] and BertThermo [ 18 ] approaches construct a classification problem to distinguish between thermostable and thermolabile proteins, but do not intend to predict T m values. Although classification-based predictors have demonstrated outstanding performance, they simplify the prediction task by reducing the number of output classes to a discrete set, whereas regression-based analysis captures the continuous nature of T m values.…”
Section: Introductionmentioning
confidence: 99%
“…ProtTrans 35 , a family of models including protBERT, leverages transformers to extract protein characteristics from sequence data. BertThermo 36 uses the protBERT embeddings with classical machine learning models for thermophilicity classification, whereas DeepSTABp incorporates ProtTrans-XL embeddings and growth temperature to predict protein melting temperature 37 . Similarly, TemStaPro 38 is an ensemble of models incorporating ProtT5-XL 35 embeddings to feed-forward densely connected neural network models, and ProLaTherm 39 integrates the encoder part of a T5-3B 40 model with ProtT5-XL 35 as the feature extractor.…”
Section: Introductionmentioning
confidence: 99%