The efficient recognition of pathogens by the adaptive immune system relies on the diversity of receptors displayed at the surface of immune cells. T-cell receptor diversity results from an initial random DNA editing process, called VDJ recombination, followed by functional selection of cells according to the interaction of their surface receptors with self and foreign antigenic peptides. To quantify the effect of selection on the highly variable elements of the receptor, we apply a probabilistic maximum likelihood approach to the analysis of high-throughput sequence data from the β-chain of human T-cell receptors. We quantify selection factors for V and J gene choice, and for the length and amino-acid composition of the variable region. Our approach is necessary to disentangle the effects of selection from biases inherent in the recombination process. Inferred selection factors differ little between donors, or between naive and memory repertoires. The number of sequences shared between donors is well-predicted by the model, indicating a purely stochastic origin of such "public" sequences. We find a significant correlation between biases induced by VDJ recombination and our inferred selection factors, together with a reduction of diversity during selection. Both effects suggest that natural selection acting on the recombination process has anticipated the selection pressures experienced during somatic evolution.
Significance statementThe immune system defends against pathogens via a diverse population of T-cells that display different antigen recognition surface receptor proteins. Receptor diversity is produced by an initial random gene recombination process, followed by selection for a desirable range of peptide binding. Although recombination is well-understood, selection has not been quantitatively characterized. By combining high throughput sequencing data with modeling, we quantify the selection pressure that shapes functional repertoires. Selection is found to vary little between individuals or between the naive and memory repertoires. It reinforces the biases of the recombination process, meaning that sequences more likely to be produced are also more likely to pass selection. The model accounts for "public" sequences shared between individuals as resulting from pure chance.The T-cell response of the adaptive immune system begins when receptor proteins on the surface of these cells recognize a pathogen peptide presented by an antigen presenting cell. The immune cell repertoire of a given individual is comprised of many clones, each with a distinct surface receptor. This diversity, which is central to the ability of the immune system to defeat pathogens, is initially created by a stochastic process of germline DNA editing (called VDJ recombination) that gives each new immune cell a unique surface receptor gene. This initial repertoire is subsequently modified by selective forces, including thymic selection against excessive (or insufficient) recognition of self proteins, that are also stochastic in nature....