2021
DOI: 10.48550/arxiv.2111.07137
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Interpreting BERT architecture predictions for peptide presentation by MHC class I proteins

Abstract: The major histocompatibility complex (MHC) class-I pathway supports the detection of cancer and viruses by the immune system. It presents parts of proteins (peptides) from inside a cell on its membrane surface enabling visiting immune cells that detect non-self peptides to terminate the cell. The ability to predict whether a peptide will get presented on MHC Class I molecules helps in designing vaccines so they can activate the immune system to destroy the invading disease protein.We designed a prediction mode… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…Protein language models developed based on deep learning approaches, such as attention-based transformer models, have shown significant progress towards solving a number of challenging problems in biology, most importantly, the protein structure prediction problem (Jumper et al, 2021). BERTMHC (Cheng et al, 2020) and ImmunoBERT (Gasser et al, 2021) for the first time applied the pre-trained protein language models in MHC-peptide binding problems. Both methods used a relatively small pre-trained model ( (Rao et al, 2019) was trained with 31 million protein sequences); currently, there are substantially larger and more informative models such as ESM-GAT fine-tuning outperforms the ESM fine-tuning method when the test set with peptide length 10-15 is considered (red points) while the results are almost the same when using the test set with peptides of length 8 and 9 (blue points).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Protein language models developed based on deep learning approaches, such as attention-based transformer models, have shown significant progress towards solving a number of challenging problems in biology, most importantly, the protein structure prediction problem (Jumper et al, 2021). BERTMHC (Cheng et al, 2020) and ImmunoBERT (Gasser et al, 2021) for the first time applied the pre-trained protein language models in MHC-peptide binding problems. Both methods used a relatively small pre-trained model ( (Rao et al, 2019) was trained with 31 million protein sequences); currently, there are substantially larger and more informative models such as ESM-GAT fine-tuning outperforms the ESM fine-tuning method when the test set with peptide length 10-15 is considered (red points) while the results are almost the same when using the test set with peptides of length 8 and 9 (blue points).…”
Section: Discussionmentioning
confidence: 99%
“…Another BERT-based model known as ImmunoBERT (Gasser et al, 2021) applies pre-trained transformer models in the MHC Class I-peptide binding problem. As reported, in this work they were not able to compare their model fairly with NetMHCPan (Reynisson et al, 2020) and MHCflurry (O'Donnell et al, 2020) performance due to a lack of access to the same training set.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, the models are not evaluated in terms of which downstream tasks can be applied via transfer learning. Recently, attempts appear, which utilize large language models in repertoire analysis (133)(134)(135)(136)(137)180). In AntiBERTa (137), fine-tuning for a downstream task is also investigated.…”
Section: Discussionmentioning
confidence: 99%
“…Protein language models developed based on deep learning approaches, such as attention-based transformer models, have shown significant progress towards solving a number of challenging problems in biology, most importantly, protein structure prediction [48]. BERTMHC [35] and ImmunoBERT [36] for the first time applied the pre-trained protein language models in MHC-peptide binding problems. Both methods used a relatively small pre-trained model (TAPE [26] which was trained with 31 million protein sequences); currently, there are substantially larger and more informative models such as ESM1b [34] and ProtTrans [32] which are trained on more than 250 million protein sequences.…”
Section: Discussionmentioning
confidence: 99%
“…They show that models generated from transfer learning, can achieve better performance on both binding and presentation prediction tasks compared to NetMHCIIpan4.0 (last version of NetMHCpan in MHC Class II [2]). Another BERT-based model known as ImmunoBERT [36] applies pre-trained transformer models in MHC Class I problem. Although they try to interpret how the BERT architecture works in MHC-peptide binding prediction, they could not compare their model fairly with NetMHCPan [2] and MHCflurry [4] due to lack of access to the same training set.…”
Section: Introductionmentioning
confidence: 99%