Next-generation sequencing of antibody transcripts provides a wealth of data, but the ability to identify function-specific antibodies solely on the basis of sequence has remained elusive. We previously characterized the VRC01 class of antibodies, which target the CD4-binding site on gp120, appear in multiple donors, and broadly neutralize HIV-1. Antibodies of this class have developmental commonalities, but typically share only ∼50% amino acid sequence identity among different donors. Here we apply next-generation sequencing to identify VRC01 class antibodies in a new donor, C38, directly from B cell transcript sequences. We first tested a lineage rank approach, but this was unsuccessful, likely because VRC01 class antibody sequences were not highly prevalent in this donor. We next identified VRC01 class heavy chains through a phylogenetic analysis that included thousands of sequences from C38 and a few known VRC01 class sequences from other donors. This "cross-donor analysis" yielded heavy chains with little sequence homology to previously identified VRC01 class heavy chains. Nonetheless, when reconstituted with the light chain from VRC01, half of the heavy chain chimeric antibodies showed substantial neutralization potency and breadth. We then identified VRC01 class light chains through a five-amino-acid sequence motif necessary for VRC01 light chain recognition. From over a million light chain sequences, we identified 13 candidate VRC01 class members. Pairing of these light chains with the phylogenetically identified C38 heavy chains yielded functional antibodies that effectively neutralized HIV-1. Bioinformatics analysis can thus directly identify functional HIV-1-neutralizing antibodies of the VRC01 class from a sequenced antibody repertoire.antibodyomics | cross-donor phylogenetic analysis | DNA sequencing | humoral immune response | sequence signature T he heavy and light chain sequences of an antibody determine its antigen-specific recognition (1-3), and a long-standing problem in structural bioinformatics has been to predict the recognition of an antibody based solely on its sequence. This problem of sequence-based recognition can be separated into two structural components (1): determining recognition from structure and (2) determining structure from sequence. Both of these components remain active areas of inquiry, with the latter representing the famous "protein-folding problem" (4, 5). For antibodies, the overall structure of immunoglobulins is known, and recognition is generally determined by six loops, the complementarity-determining regions (CDRs). Despite this reduced complexity, antibodies display diversity >10 12 in each individual and distinguish epitopes with high precision. Thus, although the general problem of predicting recognition from sequence remains intractable, a number of strategies are now being developed to determine recognition from antibody sequence.First, population-based strategies: if a particular antibody sequence is highly prevalent, biological considerations can suggest a...