2023
DOI: 10.1093/femsre/fuad003
|View full text |Cite
|
Sign up to set email alerts
|

Elucidating the functional roles of prokaryotic proteins using big data and artificial intelligence

Abstract: Annotating protein sequences according to their biological functions is one of the key steps in understanding microbial diversity, metabolic potentials and evolutionary histories. However, even in the best-studied prokaryotic genomes, not all proteins can be characterized by classical in vivo, in vitro, and/or in silico methods—a challenge rapidly growing alongside the advent of Next Generation Sequencing technologies and their enormous extension of ‘omics’ data in public databases. These so-called hypothetica… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 206 publications
(142 reference statements)
0
3
0
Order By: Relevance
“…Some models predicting EC numbers have integrated structural data to identify active site signature residues, improving the separation of non-isofunctional paralogous subgroups when these harbor experimentally validated members 17,29,84 . While EC prediction models outperform BLAST 78,80,83 , they are less accurate than the curated annotation databases such as SwissProt or KEGG 85 and not accurate enough to reliably annotate enzymes for many practical purposes, particularly when trying to reach the level of substrate specificity (or the fourth EC number 86 ). Based on the precedent of the “AlphaFold moment” that occurred for structure predictions 87 , it is highly probable that as these PLM-driven approaches improve, they could become mainstream tools to help correctly propagate known functional knowledge among isofunctional proteins.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Some models predicting EC numbers have integrated structural data to identify active site signature residues, improving the separation of non-isofunctional paralogous subgroups when these harbor experimentally validated members 17,29,84 . While EC prediction models outperform BLAST 78,80,83 , they are less accurate than the curated annotation databases such as SwissProt or KEGG 85 and not accurate enough to reliably annotate enzymes for many practical purposes, particularly when trying to reach the level of substrate specificity (or the fourth EC number 86 ). Based on the precedent of the “AlphaFold moment” that occurred for structure predictions 87 , it is highly probable that as these PLM-driven approaches improve, they could become mainstream tools to help correctly propagate known functional knowledge among isofunctional proteins.…”
Section: Resultsmentioning
confidence: 99%
“…Recently, there has been an explosion of publications reporting the use of pre-trained Protein Language Models (PLM) to predict protein functions [29][30][31][32][33][78][79][80] (Fig. 2).…”
Section: Computational Models Could Help Propagate the Experimentally...mentioning
confidence: 99%
“…Overall, only 9.5% of the predicted protein-encoding sequences could be aligned to characterized protein-encoding genes with an e-value below 0.01, indicating that although some functional information could be assigned based on structural information contained in HMMs, the majority of genes nonetheless represent novel variants with low sequence similarity to any previously reported genes. This shows an enormous knowledge gap regarding so-called hypothetical proteins, also referred to as functional dark matter ( Ardern et al, 2023 ). For up to 26% of these, the query yielded multiple hits to Archaea and Bacteria in the nr database, making taxonomic classification ambiguous.…”
Section: Discussionmentioning
confidence: 99%
“…One caveat of homology-based annotation methods is their current inability to assign function to a large fraction of genes (Price et al 2018 ). Here, deep learning-based annotation approaches hold promise in expanding the types of genes where function can be determined computationally, with the number of published methods surging in recent time (Ardern et al 2023 ). Various deep learning algorithms have been applied for functional predictions (Cao et al 2017 , Gligorijević et al 2021 , Sanderson et al 2023 , Yu et al 2023 ), among these the Transformer deep contextual language models (Brandes et al 2022 ), that were also applied in the construction of ChatGPT (OpenAI 2023 ).…”
Section: Knowledge-driven Approachesmentioning
confidence: 99%