2023
DOI: 10.1101/2023.04.25.538237
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

STAPLER: Efficient learning of TCR-peptide specificity prediction from full-length TCR-peptide data

Abstract: The prediction of peptide-MHC (pMHC) recognition by α&#946 T-cell receptors (TCRs) remains a major biomedical challenge. Here, we develop STAPLER (Shared TCR And Peptide Language bidirectional Encoder Representations from transformers), a transformer language model that uses a joint TCRalphabeta-peptide input to allow the learning of patterns within and between TCRalphabeta and peptide sequences that encode recognition. First, we demonstrate how data leakage during negative data generation can confound per… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 43 publications
0
4
0
Order By: Relevance
“…Decoding the predictive rules of TCR-pMHC specificity is a formidable challenge, largely owing to the extreme sparsity of available training data relative to the diversity of sequences that need to be interrogated in meaningful investigation. A majority of approaches ( 11 , 36 , 37 ) take a complementary approach to RACER-m by training on TCR and/or peptide primary sequence data alone. One recent method achieves training by relaxing a common requirement of having paired CDR3α/β sequences ( 36 ).…”
Section: Discussionmentioning
confidence: 99%
“…Decoding the predictive rules of TCR-pMHC specificity is a formidable challenge, largely owing to the extreme sparsity of available training data relative to the diversity of sequences that need to be interrogated in meaningful investigation. A majority of approaches ( 11 , 36 , 37 ) take a complementary approach to RACER-m by training on TCR and/or peptide primary sequence data alone. One recent method achieves training by relaxing a common requirement of having paired CDR3α/β sequences ( 36 ).…”
Section: Discussionmentioning
confidence: 99%
“…Concurrently, the development of large language models for proteins has led to important advancements in protein structure, function and evolution predictions [23,41,42]. While several studies have applied TCR-specific pLMs to TCR prediction [38,43], extensive research on the application of general pLMs, such as ESM, is lacking.…”
Section: Discussionmentioning
confidence: 99%
“…Schumacher furthermore introduced STAPLER (Shared TCR And Peptide Language bidirectional Encoder Representations from transformers), a transformer language model to predict reactivity from full-sequence TCR and peptide-MHC (pMHC) input. 28 STAPLER training data leveraged large available databases on TCRs with unknown reactivity and pMHC complexes with unknown TCR matches to teach the algorithm what these molecules look like and which patterns they entail. In a second step, smaller data sets of matching TCR-pMHC pairs were used to achieve accurate prediction of TCR reactivity but only for those antigens for which training data are available.…”
Section: Keynote Lecturementioning
confidence: 99%