Unconstrained generation of synthetic antibody-antigen structures to guide machine learning methodology for real-world antibody specificity prediction

Pa, Robert; Akbar, Rahmad; Frank, Robert; Pavlović, Milena; Widrich, Michael; Snapkov, Igor; Chernigovskaya, Maria; Scheffer, Lonneke; Slabodkin, Andrei; Mehta, Brij Bhushan; Mh, Vu; Prósz, Aurél; Abram, Krzysztof Jan; Olar, Alex; Miho, Enkelejda; Dtt, Haug; Lund-Johansen, Fridtjof; Hochreiter, Sepp; Ih, Haff; Klambauer, Günter; Gk, Sandve; Greiff,

doi:10.1101/2021.07.06.451258

Cited by 17 publications

(67 citation statements)

References 155 publications

(359 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Machine learning is increasingly used for AIRR classification both on the sequence (Greiff et al 2017b;Isacchini et al 2021;Akbar et al 2021a;Robert et al 2021a) and repertoire level (Emerson et al 2017;Shemesh et al 2021;Sidhom et al 2021), as well as for antibody generation (Friedensohn et al 2020;Akbar et al 2021b). Future studies will need to investigate whether differences in RGM also impact repertoire classification (Greiff et al 2020;Rodriguez et al 2020;Kanduri et al 2021).…”

Section: Discussionmentioning

confidence: 99%

Individualized VDJ recombination predisposes the available Ig sequence space

et al. 2021

Self Cite

View full text Add to dashboard Cite

The process of recombination between variable (V), diversity (D), and joining (J) immunoglobulin (Ig) gene segments determines an individual's naive Ig repertoire and, consequently, (auto)antigen recognition. VDJ recombination follows probabilistic rules that can be modeled statistically. So far, it remains unknown whether VDJ recombination rules differ between individuals. If these rules differed, identical (auto)antigen-specific Ig sequences would be generated with individual-specific probabilities, signifying that the available Ig sequence space is individual specific. We devised a sensitivity-tested distance measure that enables inter-individual comparison of VDJ recombination models. We discovered, accounting for several sources of noise as well as allelic variation in Ig sequencing data, that not only unrelated individuals but also human monozygotic twins and even inbred mice possess statistically distinguishable immunoglobulin recombination models. This suggests that, in addition to genetic, there is also nongenetic modulation of VDJ recombination. We demonstrate that population-wide individualized VDJ recombination can result in orders of magnitude of difference in the probability to generate (auto)antigen-specific Ig sequences. Our findings have implications for immune receptor–based individualized medicine approaches relevant to vaccination, infection, and autoimmunity.

show abstract

Section: Discussionmentioning

confidence: 99%

Individualized VDJ recombination predisposes the available Ig sequence space

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…So far, rule inference via, for instance, attribution methods, remains a challenge and is poorly standardized. 54 , 89…”

Section: Learnability Of Antibody–antigen Bindingmentioning

confidence: 99%

“…The outer red ring represents the number of antibody sequences in the iReceptor database (the largest publicly available sequence data, 53 the outer purple ring the number of synthetic antibody–antigen binding structures from Absolut! (the largest publicly available synthetic antibody-antigen structural dataset), 54 the outer blue ring displays the number of structures from AbDb (curated antibody–antigen structural data 55 obtained from the protein data bank), 56 and the outer grey ring represents developability information. 52 inner rings illustrate information about antibody-antigen complexes, ig repertoire, therapeutic antibodies, and paratope and epitope data.…”

Section: Introductionmentioning

confidence: 99%

“…Recently, we presented our efforts to increase the amount of 3D-structure data by six orders of magnitude larger than the 1200 structures available experimentally ( Figure 2 ) 55 via simulating virtual coarse-grained docking of billions of antibody-antigen pairs with several layers of biological complexity. 54 We complemented this data with in silico predicted developability parameters to create datasets that encompass the three aforementioned key design parameters: paratope-epitope binding, affinity, and developability. 27 Such efforts have begun to increase the number of datasets to a level where the benchmarking of data-intensive methods, such as deep learning to study antibody-antigen binding at the paratope-epitope level as well as deep learning-based antibody sequence generation, started to become feasible.…”

Section: Introductionmentioning

confidence: 99%

“… 27 Such efforts have begun to increase the number of datasets to a level where the benchmarking of data-intensive methods, such as deep learning to study antibody-antigen binding at the paratope-epitope level as well as deep learning-based antibody sequence generation, started to become feasible. 27 , 54 More generally, large-scale 3D-atomistic resolution data generation may represent the next major step where abundantly available antibody sequence data will be leveraged to obtain large quantities of antibody-antigen complexes via recent advances in computational structural biology methods such as antibody modeling, 59–63 molecular docking, 64–67 and molecular dynamics. 68 , 69 …”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies

Akbar

Bashour

Rawat

et al. 2022

mAbs

Self Cite

View full text Add to dashboard Cite

Although the therapeutic efficacy and commercial success of monoclonal antibodies (mAbs) are tremendous, the design and discovery of new candidates remain a time and cost-intensive endeavor. In this regard, progress in the generation of data describing antigen binding and developability, computational methodology, and artificial intelligence may pave the way for a new era of in silico on-demand immunotherapeutics design and discovery. Here, we argue that the main necessary machine learning (ML) components for an in silico mAb sequence generator are: understanding of the rules of mAb-antigen binding, capacity to modularly combine mAb design parameters, and algorithms for unconstrained parameter-driven in silico mAb sequence synthesis. We review the current progress toward the realization of these necessary components and discuss the challenges that must be overcome to allow the on-demand ML-based discovery and design of fit-for-purpose mAb therapeutic candidates.

show abstract