Adaptive immunity is mediated by lymphocyte B and T cells, which respectively express a vast and diverse repertoire of B cell and T cell receptors and, in conjunction with peptide antigen presentation through major histocompatibility complexes (MHCs), can recognize and respond to pathogens and diseased cells. In recent years, advances in deep sequencing have led to a massive increase in the amount of adaptive immune receptor repertoire data; additionally, proteomics techniques have led to a wealth of data on peptide–MHC presentation. These large-scale data sets are now making it possible to train machine and deep learning models, which can be used to identify complex and high-dimensional patterns in immune repertoires. This article introduces adaptive immune repertoires and machine and deep learning related to biological sequence data and then summarizes the many applications in this field, which span from predicting the immunological status of a host to the antigen specificity of individual receptors and the engineering of immunotherapeutics. Expected final online publication date for the Annual Review of Chemical and Biomolecular Engineering, Volume 12 is June 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
The continual evolution of the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and the emergence of variants that show resistance to vaccines and neutralizing antibodies threaten to prolong the coronavirus disease 2019 (COVID-19) pandemic. Selection and emergence of SARS-CoV-2 variants are driven in part by mutations within the viral spike protein and in particular the ACE2 receptor-binding domain (RBD), a primary target site for neutralizing antibodies. Here, we develop deep mutational learning (DML), a machine learning-guided protein engineering technology, which is used to interrogate a massive sequence space of combinatorial mutations, representing billions of RBD variants, by accurately predicting their impact on ACE2 binding and antibody escape. A highly diverse landscape of possible SARS-CoV-2 variants is identified that could emerge from a multitude of evolutionary trajectories. DML may be used for predictive profiling on current and prospective variants, including highly mutated variants such as omicron (B.1.1.529), thus supporting decision making for public heath as well as guiding the development of therapeutic antibody treatments and vaccines for COVID-19.
Background
The continued spread of SARS-CoV-2 and emergence of new variants with higher transmission rates and/or partial resistance to vaccines has further highlighted the need for large-scale testing and genomic surveillance. However, current diagnostic testing (e.g., PCR) and genomic surveillance methods (e.g., whole genome sequencing) are performed separately, thus limiting the detection and tracing of SARS-CoV-2 and emerging variants.
Results
Here, we developed DeepSARS, a high-throughput platform for simultaneous diagnostic detection and genomic surveillance of SARS-CoV-2 by the integration of molecular barcoding, targeted deep sequencing, and computational phylogenetics. DeepSARS enables highly sensitive viral detection, while also capturing genomic diversity and viral evolution. We show that DeepSARS can be rapidly adapted for identification of emerging variants, such as alpha, beta, gamma, and delta strains, and profile mutational changes at the population level.
Conclusions
DeepSARS sets the foundation for quantitative diagnostics that capture viral evolution and diversity.
Graphical abstract
DeepSARS uses molecular barcodes (BCs) and multiplexed targeted deep sequencing (NGS) to enable simultaneous diagnostic detection and genomic surveillance of SARS-CoV-2. Image was created using Biorender.com.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.