We demonstrate the use of a Generative Adversarial Network (GAN), trained from a set of over 400,000 light and heavy chain human antibody sequences, to learn the rules of human antibody formation. The resulting model surpasses common in silico techniques by capturing residue diversity throughout the variable region, and is capable of generating extremely large, diverse libraries of novel antibodies that mimic somatically hypermutated human repertoire response. This method permits us to rationally design de novo humanoid antibody libraries with explicit control over various properties of our discovery library. Through transfer learning, we are able to bias the GAN to generate molecules with key properties of interest such as improved stability and developability, lower predicted MHC Class II binding, and specific complementarity-determining region (CDR) characteristics. These approaches also provide a mechanism to better study the complex relationships between antibody sequence and molecular behavior, both in vitro and in vivo . We validate our method by successfully expressing a proof-of-concept library of nearly 100,000 GAN-generated antibodies via phage display. We present the sequences and homology-model structures of example generated antibodies expressed in stable CHO pools and evaluated across multiple biophysical properties. The creation of discovery libraries using our in silico approach allows for the control of pharmaceutical properties such that these therapeutic antibodies can provide a more rapid and cost-effective response to biological threats.
The maximum entropy method (MEM) provides a self-modeling fit to data in which minimization of the χ(2) goodness-of-fit parameter is coupled with maximization of a statistical entropy function. We have found that MEM provides an excellent visual description of the uncertainties, errors, and limitations associated with the distributions which it recovers. To more accurately interpret fluorescence lifetime distributions recovered by the MEM from frequency domain lifetime data, a detailed examination of the effects of frequency range, noise, data set size, and sample heterogeneity was carried out for both simulated and real data. Results clearly demonstrate that the frequency range in which data are collected can affect the number and nature of the fluorescence lifetime components that are recovered by MEM, and the quality of the data at the frequencies that are optimal for a given lifetime is also crucial. Expansion of sufficient data sets to include more frequencies, or more replicates at the same frequencies, provides little improvement over the original data set when the lifetimes are well-windowed by the frequency range. Synergism among multiple components in a sample can affect the recovered distribution, by shifting and splitting poorly windowed components and broadening the recovered peaks for all components. These effects are related to the number of components for which evidence must be found.
Multivariate curve resolution (MCR) is a powerful technique for extracting chemical information from measured spectra of complex mixtures. A modified MCR technique that utilized both measured and second-derivative spectra to account for observed sample-to-sample variability attributable to changes in soil reflectivity was used to estimate the spectrum of dibutyl phosphate (DBP) adsorbed on two different soil types. This algorithm was applied directly to measurements of reflection spectra of soils coated with analyte without resorting to soil preparations such as grinding or dilution in potassium bromide. The results provided interpretable spectra that can be used to guide strategies for detection and classification of organic analytes adsorbed on soil. Comparisons to the neat DBP liquid spectrum showed that the recovered analyte spectra from both soils showed spectral features from methyl, methylene, hydroxyl, and P=O functional groups, but most conspicuous was the absence of the strong PO-(CH2)3CH3 stretch absorption at 1033 cm(-1). These results are consistent with those obtained previously using extended multiplicative scatter correction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.