2020
DOI: 10.1101/2020.06.15.153643
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks

Abstract: The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
72
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 65 publications
(79 citation statements)
references
References 49 publications
0
72
0
Order By: Relevance
“…Self-supervised pretraining has been shown to boost model performance for natural language processing and computer vision tasks (Devlin et al , 2019; Chen et al , 2020). Recent research has also shown the potential benefits of self-supervised pretraining on protein related tasks (Rao et al , 2019; Nambiar et al , 2020), such as contact prediction. However, to date, no work has explored self-supervised pretraining on MHC–peptide related tasks.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Self-supervised pretraining has been shown to boost model performance for natural language processing and computer vision tasks (Devlin et al , 2019; Chen et al , 2020). Recent research has also shown the potential benefits of self-supervised pretraining on protein related tasks (Rao et al , 2019; Nambiar et al , 2020), such as contact prediction. However, to date, no work has explored self-supervised pretraining on MHC–peptide related tasks.…”
Section: Resultsmentioning
confidence: 99%
“…Such models are trained to predict words masked out in a sentence or to predict the next word or sentence following some context. Similar techniques have also been applied to proteins (Rao et al , 2019; Nambiar et al , 2020; Heinzinger et al , 2019). Since these models do not require labels to train, they can be trained on very large corpora of protein sequences across many species.…”
Section: Introductionmentioning
confidence: 99%
“…Recent studies show that models based on natural language processing inspired techniques such as Transformer, [217] BERT, [218] and GPT-2 [219] can learn features from a large corpus of protein sequences in a self-supervised fashion, with applications in a variety of downstream tasks. [220,221] Besides a linear sequence of amino acids, proteins can also be modeled as a graph to capture both structure and sequence information. Graph neural networks [222] are powerful deep learning architectures for learning representations of nodes and edges from such data.…”
Section: Discussionmentioning
confidence: 99%
“…Natural language processing models, specifically language modeling techniques, have also made an impact in the domain of COVID-19 vaccine discovery. Pre-trained transformers were used to predict protein interaction (Nambiar et al, 2020) and model molecular reactions in carbohydrate chemistry (Pesciullesi et al, 2020), which can be utilized in the process of vaccine development. Chen et al discussed the use-case of an LSTMbased seq-2-seq model for predicting the secondary structure of certain SARS-COV-2 proteins (Karpov et al, 2019) 3 .…”
Section: Covid-19 Vaccine Discoverymentioning
confidence: 99%