Prediction of T-cell receptor (TCR) interactions with MHC-peptide complexes remains highly challenging. This challenge is primarily due to three dominant factors: data accuracy, data scarceness, and problem complexity. Here, we showcase that “shallow” convolutional neural network (CNN) architectures are adequate to deal with the problem complexity imposed by the length variations of TCRs. We demonstrate that current public bulk CDR3β-pMHC binding data overall is of low quality and that the development of accurate prediction models is contingent on paired α/β TCR sequence data corresponding to at least 150 distinct pairs for each investigated pMHC. In comparison, models trained on CDR3α or CDR3β data alone demonstrated a variable and pMHC specific relative performance drop. Together these findings support that T-cell specificity is predictable given the availability of accurate and sufficient paired TCR sequence data. NetTCR-2.0 is publicly available at https://services.healthtech.dtu.dk/service.php?NetTCR-2.0.
T cell receptors (TCR) define the specificity of T cells and are responsible for their interaction with peptide antigen targets presented in complex with major histocompatibility complex (MHC) molecules. Understanding the rules underlying this interaction hence forms the foundation for our understanding of basic adaptive immunology. Over the last decade, efforts have been dedicated to developing assays for high throughput identification of peptide-specific TCRs. Based on such data, several computational methods have been proposed for predicting the TCR-pMHC interaction. The general conclusion from these studies is that the prediction of TCR interactions with MHC-peptide complexes remains highly challenging. Several reasons form the basis for this including scarcity and quality of data, and ill-defined modeling objectives imposed by the high redundancy of the available data. In this work, we propose a framework for dealing with this redundancy, allowing us to address essential questions related to the modeling of TCR specificity including the use of peptide- versus pan-specific models, how to best define negative data, and the performance impact of integrating of CDR1 and 2 loops. Further, we illustrate how and why it is strongly recommended to include simple similarity-based modeling approaches when validating an improved predictive power of machine learning models, and that such validation should include a performance evaluation as a function of “distance” to the training data, to quantify the potential for generalization of the proposed model. The conclusion of the work is that, given current data, TCR specificity is best modeled using peptide-specific approaches, integrating information from all 6 CDR loops, and with negative data constructed from a combination of true and mislabeled negatives. Comparing such machine learning models to similarity-based approaches demonstrated an increased performance gain of the former as the “distance” to the training data was increased; thus demonstrating an improved generalization ability of the machine learning-based approaches. We believe these results demonstrate that the outlined modeling framework and proposed evaluation strategy form a solid basis for investigating the modeling of TCR specificities and that adhering to such a framework will allow for faster progress within the field. The final devolved model, NetTCR-2.1, is available at https://services.healthtech.dtu.dk/service.php?NetTCR-2.1.
Many different solutions to predicting the cognate epitope target of a T-cell receptor (TCR) have been proposed. However several questions on the advantages and disadvantages of these different approaches remain unresolved, as most methods have only been evaluated within the context of their initial publications and data sets. Here, we report the findings of the first public TCR-epitope prediction benchmark performed on 23 prediction models in the context of the ImmRep 2022 TCR-epitope specificity workshop. This benchmark revealed that the use of paired-chain alpha-beta, as well as CDR1/2 or V/J information, when available, improves classification obtained with CDR3 data, independent of the underlying approach. In addition, we found that straight-forward distance-based approaches can achieve a respectable performance when compared to more complex machine-learning models. Finally, we highlight the need for a truly independent follow-up benchmark and provide recommendations for the design of such a next benchmark.
Pairing of the T cell receptor (TCR) with its cognate peptide-MHC (pMHC) is a cornerstone in T cell-mediated immunity. Recently, single-cell sequencing coupled with DNA-barcoded multimer staining has enabled high-throughput studies of T cell specificities. However, the immense variability of TCR-pMHC interactions combined with the relatively low signal-to-noise ratio in the data generated using current technologies are complicating these studies. Several approaches have been proposed for denoising single-cell TCR-pMHC specificity data. Here, we present a benchmark evaluating two such denoising methods, ICON and iTRAP. We applied and evaluated the methods on publicly available immune profiling data provided by 10x Genomics. We analyzed both internal metrics developed for the purpose and performance on independent data using machine learning methods trained on the raw and denoised 10x data. We find an increased signal-to-noise ratio comparing the denoised to the raw data for both methods, and demonstrate an overall superior performance of the iTRAP method compared with the ICON method, in terms of both data consistency and performance. In conclusion, this study demonstrates that Improving the data quality by optimizing signal yield from high throughput studies of TCRpMHC-specificity is paramount in increasing our understanding of T cell-mediated immunity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.