We demonstrate the use of a Generative Adversarial Network (GAN), trained from a set of over 400,000 light and heavy chain human antibody sequences, to learn the rules of human antibody formation. The resulting model surpasses common in silico techniques by capturing residue diversity throughout the variable region, and is capable of generating extremely large, diverse libraries of novel antibodies that mimic somatically hypermutated human repertoire response. This method permits us to rationally design de novo humanoid antibody libraries with explicit control over various properties of our discovery library. Through transfer learning, we are able to bias the GAN to generate molecules with key properties of interest such as improved stability and developability, lower predicted MHC Class II binding, and specific complementarity-determining region (CDR) characteristics. These approaches also provide a mechanism to better study the complex relationships between antibody sequence and molecular behavior, both in vitro and in vivo . We validate our method by successfully expressing a proof-of-concept library of nearly 100,000 GAN-generated antibodies via phage display. We present the sequences and homology-model structures of example generated antibodies expressed in stable CHO pools and evaluated across multiple biophysical properties. The creation of discovery libraries using our in silico approach allows for the control of pharmaceutical properties such that these therapeutic antibodies can provide a more rapid and cost-effective response to biological threats.
The maximum entropy method (MEM) provides a self-modeling fit to data in which minimization of the χ(2) goodness-of-fit parameter is coupled with maximization of a statistical entropy function. We have found that MEM provides an excellent visual description of the uncertainties, errors, and limitations associated with the distributions which it recovers. To more accurately interpret fluorescence lifetime distributions recovered by the MEM from frequency domain lifetime data, a detailed examination of the effects of frequency range, noise, data set size, and sample heterogeneity was carried out for both simulated and real data. Results clearly demonstrate that the frequency range in which data are collected can affect the number and nature of the fluorescence lifetime components that are recovered by MEM, and the quality of the data at the frequencies that are optimal for a given lifetime is also crucial. Expansion of sufficient data sets to include more frequencies, or more replicates at the same frequencies, provides little improvement over the original data set when the lifetimes are well-windowed by the frequency range. Synergism among multiple components in a sample can affect the recovered distribution, by shifting and splitting poorly windowed components and broadening the recovered peaks for all components. These effects are related to the number of components for which evidence must be found.
Multivariate curve resolution (MCR) is a powerful technique for extracting chemical information from measured spectra of complex mixtures. A modified MCR technique that utilized both measured and second-derivative spectra to account for observed sample-to-sample variability attributable to changes in soil reflectivity was used to estimate the spectrum of dibutyl phosphate (DBP) adsorbed on two different soil types. This algorithm was applied directly to measurements of reflection spectra of soils coated with analyte without resorting to soil preparations such as grinding or dilution in potassium bromide. The results provided interpretable spectra that can be used to guide strategies for detection and classification of organic analytes adsorbed on soil. Comparisons to the neat DBP liquid spectrum showed that the recovered analyte spectra from both soils showed spectral features from methyl, methylene, hydroxyl, and P=O functional groups, but most conspicuous was the absence of the strong PO-(CH2)3CH3 stretch absorption at 1033 cm(-1). These results are consistent with those obtained previously using extended multiplicative scatter correction.
A new technique, total lifetime distribution analysis (TLDA), is described for rapid, sensitive, and accurate lifetime characterization of complex samples. Multiharmonic Fourier transform technology in a commercial, frequency-domain fluorescence lifetime instrument allows rapid acquisition of TLDA data. High sensitivity derives from the use of the entire fluorescence emission from the sample in the lifetime measurement. The maximum entropy method (MEM) provides a consistent basis for modeling of the lifetime data for accurate recovery of the total lifetime distribution of the sample. Because MEM is self-modeling, it is not subject to the same sources of bias that influence nonlinear least-squares fits of lifetime data to a priori models. These features make TLDA an effective tool for sample characterization and fingerprinting that is based on the responsiveness of fluorescence lifetime to the chemical composition and dynamic processes that contribute to the uniqueness of the sample. TLDA results are presented for coal liquids and a humic substance. The effect of signal intensity on lifetime recovery is investigated, and comparison is made between MEM and conventional nonlinear least-squares for data analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.