2023
DOI: 10.1186/s12859-023-05549-w
|View full text |Cite
|
Sign up to set email alerts
|

Protein language models can capture protein quaternary state

Orly Avraham,
Tomer Tsaban,
Ziv Ben-Aharon
et al.

Abstract: Background Determining a protein’s quaternary state, i.e. the number of monomers in a functional unit, is a critical step in protein characterization. Many proteins form multimers for their activity, and over 50% are estimated to naturally form homomultimers. Experimental quaternary state determination can be challenging and require extensive work. To complement these efforts, a number of computational tools have been developed for quaternary state prediction, often utilizing experimentally val… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 37 publications
0
5
0
Order By: Relevance
“…Furthermore, transformer models have revolutionized the field by enabling the development of large Protein Language Models (LPLMs), which have emerged as transformative tools in computational biology and bioinformatics [23,24]. Similar to large Protein Language Models (LLMs) of NLP trained on large corpora of words [25], similar efforts have been applied to train LPLMs using large protein databases such as BFD100, UniRef50, and UniRef100 with trillions of protein sequences.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, transformer models have revolutionized the field by enabling the development of large Protein Language Models (LPLMs), which have emerged as transformative tools in computational biology and bioinformatics [23,24]. Similar to large Protein Language Models (LLMs) of NLP trained on large corpora of words [25], similar efforts have been applied to train LPLMs using large protein databases such as BFD100, UniRef50, and UniRef100 with trillions of protein sequences.…”
Section: Related Workmentioning
confidence: 99%
“…At the same time, there is also a considerable proportion of proteins with over 30% similarity but different NS. So, relying solely on sequence similarity (e.g., a threshold of 30%) to predict the oligomeric state of proteins may not always be reliable, a point that has been mentioned in earlier research [21] as well.…”
Section: Processing and Analyzing Datasets Extracted From Uniprotmentioning
confidence: 99%
“…Recent advancements in deep learning have shown promise in predicting protein's quaternary state. Protein language models, utilizing computational natural language processing techniques for proteins, have successfully captured secondary structure, protein cellular localization, and other features from amino acid sequences [21]. This raises the question: can a protein's quaternary state be inferred solely from its sequence?…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…More computationally efficient methods to fold large protein oligomers, such as Uni-Fold Symmetry [ 13 ] still require the pre-specified symmetry group as input to make predictions. Protein embeddings from ESM2 [ 14 ] have been used to predict the most likely quaternary state of a protein chain (QUEEN [ 15 ]); however, in this approach, the model only predicts the multiplicity of the oligomer thereby giving no clue as to global symmetry of the protein.…”
Section: Introductionmentioning
confidence: 99%