High-throughput immune repertoire sequencing is promising to lead to new statistical diagnostic tools for medicine and biology. Successful implementations of these methods require a correct characterization, analysis, and interpretation of these data sets. We present IGoR (Inference and Generation Of Repertoires)—a comprehensive tool that takes B or T cell receptor sequence reads and quantitatively characterizes the statistics of receptor generation from both cDNA and gDNA. It probabilistically annotates sequences and its modular structure can be used to investigate models of increasing biological complexity for different organisms. For B cells, IGoR returns the hypermutation statistics, which we use to reveal co-localization of hypermutations along the sequence. We demonstrate that IGoR outperforms existing tools in accuracy and estimate the sample sizes needed for reliable repertoire characterization.
We quantify the VDJ recombination and somatic hypermutation processes in human B cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, owing to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.
The diversity of T-cell receptors recognizing foreign pathogens is generated through a highly stochastic recombination process, making the independent production of the same sequence rare. Yet unrelated individuals do share receptors, which together constitute a “public” repertoire of abundant clonotypes. The TCR repertoire is initially formed prenatally, when the enzyme inserting random nucleotides is downregulated, producing a limited diversity subset. By statistically analyzing deep sequencing T-cell repertoire data from twins, unrelated individuals of various ages, and cord blood, we show that T-cell clones generated before birth persist and maintain high abundances in adult organisms for decades, slowly decaying with age. Our results suggest that large, low-diversity public clones are created during pre-natal life, and survive over long periods, providing the basis of the public repertoire.
One contribution of 13 to a theme issue 'The dynamics of antibody repertoires'. We quantify the VDJ recombination and somatic hypermutation processes in human B cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, owing to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.
The T-cell (TCR) repertoire relies on the diversity of receptors composed of two chains, called α and β, to recognize pathogens. Using results of high throughput sequencing and computational chain-pairing experiments of human TCR repertoires, we quantitively characterize the αβ generation process. We estimate the probabilities of a rescue recombination of the β chain on the second chromosome upon failure or success on the first chromosome. Unlike β chains, α chains recombine simultaneously on both chromosomes, resulting in correlated statistics of the two genes which we predict using a mechanistic model. We find that ∼35% of cells express both α chains. Altogether, our statistical analysis gives a complete quantitative mechanistic picture that results in the observed correlations in the generative process. We learn that the probability to generate any TCRαβ is lower than 10−12 and estimate the generation diversity and sharing properties of the αβ TCR repertoire.
The diversity of T-cell receptors recognizing foreign pathogens is generated through a highly stochastic recombination process, making the independent production of the same sequence rare. Yet unrelated individuals do share receptors, which together constitute a "public" repertoire of abundant clonotypes. The TCR repertoire is initially formed prenatally, when the enzyme inserting random nucleotides is downregulated, producing a limited diversity subset. By statistically analyzing deep sequencing T-cell repertoire data from twins, unrelated individuals of various ages, and cord blood, we show that T-cell clones generated before birth persist and maintain high abundances in adult organisms for decades, slowly decaying with age. Our results suggest that large, low-diversity public clones are created during pre-natal life, and survive over long periods, providing the basis of the public repertoire. Author summaryThe enormous diversity of T-cell receptor (TCR) molecules allows our adaptive immune system to recognize and fight infections. TCRs are formed through the stochastic rearrangement of DNA. By analysing human repertoire sequences of identical twins using a statistical model for TCR formation, we identified T-cells that were exchanged between twin embryos during pregnancy. We exploited the slightly different recombination statistics between fetal clonotypes and mature ones to track their relative fractions in adult Tcell repertoires of different ages. We showed that the decay of fetal clonotypes with age is extremely slow, spanning several decades. Our findings suggest that an important part of our adaptive immune system is formed before birth. PLOS Computational Biology | https://doi
The T-cell (TCR) repertoire relies on the diversity of receptors composed of two chains, called α and β, to recognize pathogens. Using results of high throughput sequencing and computational chain-pairing experiments of human TCR repertoires, we quantitively characterize the αβ generation process. We estimate the probabilities of a rescue recombination of the β chain on the second chromosome upon failure or success on the first chromosome. Unlike β chains, α chains recombine simultaneously on both chromosomes, resulting in correlated statistics of the two genes which we predict using a mechanistic model. We find that ∼ 28% of cells express both α chains. We report that clones sharing the same β chain but different α chains are overrepresented, suggesting that they respond to common immune challenges. Altogether, our statistical analysis gives a complete quantitative mechanistic picture that results in the observed correlations in the generative process. We learn that the probability to generate any TCRαβ is lower than 10 −12 making it very unlikely for two people to share a full receptor by chance. We also estimate the generation diversity of the full TCR repertoire.
Motivation: The diversity of the immune repertoire is initially generated by random rearrangements of the receptor gene during early T and B cell development. Rearrangement scenarios are composed of random events-choices of gene templates, base pair deletions and insertionsdescribed by probability distributions. Not all scenarios are equally likely, and the same receptor sequence may be obtained in several different ways. Quantifying the distribution of these rearrangements is an essential baseline for studying the immune system diversity. Inferring the properties of the distributions from receptor sequences is a computationally hard problem, requiring enumerating every possible scenario for every sampled receptor sequence.Results: We present a Hidden Markov model, which accounts for all plausible scenarios that can generate the receptor sequences. We developed and implemented a method based on the Baum-Welch algorithm that can efficiently infer the parameters for the different events of the rearrangement process. We tested our software tool on sequence data for both the alpha and beta chains of the T cell receptor. To test the validity of our algorithm, we also generated synthetic sequences produced by a known model, and confirmed that its parameters could be accurately inferred back from the sequences. The inferred model can be used to generate synthetic sequences, to calculate the probability of generation of any receptor sequence, as well as the theoretical diversity of the repertoire. We estimate this diversity to be %10 23 for human T cells. The model gives a baseline to investigate the selection and dynamics of immune repertoires. Availability and implementation: Source code and sample sequence files are available at https://bit bucket.org/yuvalel/repgenhmm/downloads.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.