Recent studies have revealed that immune repertoires contain a substantial fraction of public clones, which may be defined as Ab or TCR clonal sequences shared across individuals. It has remained unclear whether public clones possess predictable sequence features that differentiate them from private clones, which are believed to be generated largely stochastically. This knowledge gap represents a lack of insight into the shaping of immune repertoire diversity. Leveraging a machine learning approach capable of capturing the high-dimensional compositional information of each clonal sequence (defined by CDR3), we detected predictive public clone and private clone-specific immunogenomic differences concentrated in CDR3's N1-D-N2 region, which allowed the prediction of public and private status with 80% accuracy in humans and mice. Our results unexpectedly demonstrate that public, as well as private, clones possess predictable high-dimensional immunogenomic features. Our support vector machine model could be trained effectively on large published datasets (3 million clonal sequences) and was sufficiently robust for public clone prediction across individuals and studies prepared with different library preparation and high-throughput sequencing protocols. In summary, we have uncovered the existence of high-dimensional immunogenomic rules that shape immune repertoire diversity in a predictable fashion. Our approach may pave the way for the construction of a comprehensive atlas of public mouse and human immune repertoires with potential applications in rational vaccine design and immunotherapeutics.
KeBABS provides a powerful, flexible and easy to use framework for KE: rnel- B: ased A: nalysis of B: iological S: equences in R. It includes efficient implementations of the most important sequence kernels, also including variants that allow for taking sequence annotations and positional information into account. KeBABS seamlessly integrates three common support vector machine (SVM) implementations with a unified interface. It allows for hyperparameter selection by cross validation, nested cross validation and also features grouped cross validation. The biological interpretation of SVM models is supported by (1) the computation of weights of sequence patterns and (2) prediction profiles that highlight the contributions of individual sequence positions or sections.
18Recent studies have revealed that immune repertoires contain a substantial fraction of public clones, 19 which are defined as antibody or T-cell receptor (TCR) clonal sequences shared across individuals. As of 20 yet, it has remained unclear whether public clones possess predictable sequence features that separate 21 them from private clones, which are believed to be generated largely stochastically. This knowledge gap 22 represents a lack of insight into the shaping of immune repertoire diversity. Leveraging a machine 23 learning approach capable of capturing the high-dimensional compositional information of each clonal 24 sequence (defined by the complementarity determining region 3, CDR3), we detected predictive public-25 and private-clone-specific immunogenomic differences concentrated in the CDR3's N1-D-N2 region, 26 which allowed the prediction of public and private status with 80% accuracy in both humans and mice. 27Our results unexpectedly demonstrate that not only public but also private clones possess predictable 28 high-dimensional immunogenomic features. Our support vector machine model could be trained 29 effectively on large published datasets (3 million clonal sequences) and was sufficiently robust for public 30 clone prediction across studies prepared with different library preparation and high-throughput 31 sequencing protocols. In summary, we have uncovered the existence of high-dimensional 32 immunogenomic rules that shape immune repertoire diversity in a predictable fashion. Our approach may 33 pave the way towards the construction of a comprehensive atlas of public clones in immune repertoires, 34 which may have applications in rational vaccine design and immunotherapeutics. 35 36 37 38 peer-reviewed)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.