“…Most are constructed based on data from the IEDB, VDJdb, and/or McPAS-TCR and, in addition to the epitope information, make use of either CDR3β sequences alone [13][14][15] , a mixture of CDR3α and CDR3β sequences 16 , or smaller data sets entailing all 6 CDR3 sequences and potentially additional cellular information 17,18 . Methodologically, the different studies range from simple CDR3β alignmentbased methods 19,22 , over CDR similarity-weighted distances such as TCRdist 7 , k-mer feature spaces in combination with PCA and decision trees (SETE 13 ), random forests 20,21 such as TCRex 23 , CNN-based (ImRex) 16 , and Gaussian process classification methods (TCRGP 17 ), to more complex approaches integrating natural language processing (NLP) methods (ERGO 14 ). The overall conclusion from these earlier works is that while the prediction of TCR specificity is feasible, the volume and accuracy of current data limit the performance of the developed models.…”