2021
DOI: 10.1101/2021.04.25.441334
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Light Attention Predicts Protein Location from the Language of Life

Abstract: Although knowing where a protein functions in a cell is important to characterize biological processes, this information remains unavailable for most known proteins. Machine learning narrows the gap through predictions from expertly chosen input features leveraging evolutionary information that is resource expensive to generate. We showcase using embeddings from protein language models for competitive localization predictions not relying on evolutionary information. Our lightweight deep neural network architec… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
4

Relationship

5
3

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 59 publications
0
9
0
Order By: Relevance
“…Embeddings can outperform homology-based inference based on the traditional sequence comparisons optimized over five decades (Littmann, Bordin, et al, 2021;. With little additional optimization, methods using only embeddings without any MSA even outperform advanced methods relying on MSAs (Elnaggar et al, 2021;Stärk et al, 2021). In the simplest form, embeddings mirror the last "hidden" states/values of pLMs.…”
Section: Introduction mentioning
confidence: 99%
“…Embeddings can outperform homology-based inference based on the traditional sequence comparisons optimized over five decades (Littmann, Bordin, et al, 2021;. With little additional optimization, methods using only embeddings without any MSA even outperform advanced methods relying on MSAs (Elnaggar et al, 2021;Stärk et al, 2021). In the simplest form, embeddings mirror the last "hidden" states/values of pLMs.…”
Section: Introduction mentioning
confidence: 99%
“…On the other hand, it was already shown that methods using embeddings from protein LMs as input get very close or even outperform methods that use evolutionary constraints defined by MSAs as input (Elnaggar et al 2021;Rao et al 2020;Stärk et al 2021). Therefore, we can currently only conclude that the simplest approach of using reconstruction probabilities from ProtBert without further processing are not readily suitable to scan the mutational landscape of proteins.…”
Section: Discussionmentioning
confidence: 94%
“…1 in (Elnaggar et al 2021)). Embeddings have been used successfully as exclusive input to predicting secondary structure and subcellular localization at performance levels almost reaching (Alley et al 2019;Heinzinger et al 2019;Rives et al 2021) or even exceeding (Elnaggar et al 2021;Stärk et al 2021) state-of-the-art methods using evolutionary information from MSAs as input. Embeddings can even substitute sequence similarity for homology-based annotation transfer (Littmann et al 2021a;Littmann et al 2021b).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…1 in (Elnaggar et al 2021)). Embeddings have succeeded as exclusive input to predicting secondary structure and subcellular location at performance levels almost reaching (Alley et al 2019;Heinzinger et al 2019;Rives et al 2021) or even exceeding (Elnaggar et al 2021;Littmann et al 2021c;Stärk et al 2021) state-of-the-art (SOTA) methods using EI from MSAs as input. Embeddings even succeed in substituting sequence similarity for homology-based annotation transfer (Littmann et al 2021a;Littmann et al 2021b) and in predicting the effect of mutations on protein-protein interactions (Zhou et al 2020).…”
Section: Introductionmentioning
confidence: 99%