2022
DOI: 10.1093/bioinformatics/btac351
|View full text |Cite
|
Sign up to set email alerts
|

Prior knowledge facilitates low homologous protein secondary structure prediction with DSM distillation

Abstract: Motivation Protein secondary structure prediction (PSSP) is one of the fundamental and challenging problems in the field of computational biology. Accurate PSSP relies on sufficient homologous protein sequences to build the multiple sequence alignment (MSA). Unfortunately, many proteins lack homologous sequences, which results in the low quality of MSA and poor performance. In this paper, we propose the novel DSM-Distil to tackle this issue, which takes advantage of the pretrained BERT and ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…There are Transformer models that utilize evolutionary information extracted from MSAs during the pre-training stage, but pre-training is mostly done as a one-off process, and representation for new proteins is extracted using only the pretrained hidden states of the Transformer models. MSA tools generate alignment by searching homologs from the entire UniProt database, time-consuming ( Hong et al, 2021 ) process, whereby generating embeddings using protein language models is less cumbersome but it also builds richer and more complete features for low homologous proteins ( Wang et al, 2022 ).…”
Section: Solving Protein Prediction Tasks Using Transformersmentioning
confidence: 99%
“…There are Transformer models that utilize evolutionary information extracted from MSAs during the pre-training stage, but pre-training is mostly done as a one-off process, and representation for new proteins is extracted using only the pretrained hidden states of the Transformer models. MSA tools generate alignment by searching homologs from the entire UniProt database, time-consuming ( Hong et al, 2021 ) process, whereby generating embeddings using protein language models is less cumbersome but it also builds richer and more complete features for low homologous proteins ( Wang et al, 2022 ).…”
Section: Solving Protein Prediction Tasks Using Transformersmentioning
confidence: 99%
“…The generated feature list can be used for any tuning task in proteome bioinformatics. Some of the common applications of NLP models in proteomics are protein function prediction [10,11], protein-protein interaction prediction [20,21], protein structure prediction [22], drug discovery [23], etc. For instance, protein-protein interaction prediction aids in understanding cellular processes by identifying potential interactions between proteins, enabling the exploration of complex molecular networks and signaling pathways within cells.…”
Section: Significance Of the Studymentioning
confidence: 99%
“…Wang et al proposed DSM-Distil, a transformer-based model using DSM distillation to facilitate low homologous protein secondary structure prediction [22]. Furthermore, Villegas-Morcillo et al conducted a transformers model using word embeddings for fold prediction [71].…”
Section: Protein Function Predictionmentioning
confidence: 99%
“…The experiment result showed that using PSSM-Distil instead of the standard PSSM increased the accuracy of PSSP in low-quality PSSM cases in BC40 and CB513 datasets. Subsequently, the same researchers also proposed a dynamic scoring matrix (DSM)-Distil to replace PSSM and other widely used features [165] . This feature leveraged the pre-trained BERT to construct a dynamic scoring matrix (DSM) and performed knowledge distillation on the DSM.…”
Section: Pssp In Post-alphafold Publicationmentioning
confidence: 99%