An adaptive soft‐sensor for advanced real‐time monitoring of an antibody‐drug conjugation reaction

Self Cite

Chemometric modeling for spectral data is considered a key technology in biopharmaceutical processing to realize real-time process control and release testing. Machine learning (ML) models have been shown to increase the accuracy of various spectral regression and classification tasks, remove challenging preprocessing steps for spectral data, and promise to improve the transferability of models when compared to commonly applied, linear methods. The training and optimization of ML models require large data sets which are not available in the context of biopharmaceutical processing. Generative methods to extend data sets with realistic in silico samples, so-called data augmentation, may provide the means to alleviate this challenge. In this study, we develop and implement a novel data augmentation method for generating in silico spectral data based on local estimation of pure component profiles for training convolutional neural network (CNN) models using four data sets. We simultaneously tune hyperparameters associated with data augmentation and the neural network architecture using Bayesian optimization. Finally, we compare the optimized CNN models with partial least-squares regression models (PLS) in terms of accuracy, robustness, and interpretability. The proposed data augmentation method is shown to produce highly realistic spectral data by adapting the estimates of the pure component profiles to the sampled concentration regimes. Augmenting CNNs with the in silico spectral data is shown to improve the prediction accuracy for the quantification of monoclonal antibody (mAb) size variants by up to 50% in comparison to single-response PLS models. Bayesian structure optimization suggests that multiple convolutional blocks are beneficial for model accuracy and enable transfer across different data sets. Model-agnostic feature importance methods and synthetic noise perturbation are used to directly compare the optimized CNNs with PLS models. This enables the identification of wavelength regions critical for model performance and suggests increased robustness against Gaussian white noise and wavelength shifts of the CNNs compared to the PLS models.

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Generative data augmentation and automated optimization of convolutional neural networks for process monitoring

Schiemer,

Rüdt,

Hubbuch

2024

Self Cite

“…To effectively reduce the noise in the model, a larger data set preferably from fedbatch experiments would be required. Eventually, due to the nonlinear relationship of the Raman spectra and the VLP concentration, non-linear regression models should be evaluated such as kernelbased methods (Thissen et al, 2004;Barman et al, 2010;Zavala-Ortiz et al, 2020;Schiemer et al, 2023) or neural networks (Cui and Fearn, 2018;Wang et al, 2023;Schiemer et al, 2024).…”

Section: Effects Of Preprocessing Pipeline On Model Performancementioning

confidence: 99%

Raman-based PAT for VLP precipitation: systematic data diversification and preprocessing pipeline identification

Dietrich,

Schiemer,

Kurmann

et al. 2024

Self Cite

Virus-like particles (VLPs) are a promising class of biopharmaceuticals for vaccines and targeted delivery. Starting from clarified lysate, VLPs are typically captured by selective precipitation. While VLP precipitation is induced by step-wise or continuous precipitant addition, current monitoring approaches do not support the direct product quantification, and analytical methods usually require various, time-consuming processing and sample preparation steps. Here, the application of Raman spectroscopy combined with chemometric methods may allow the simultaneous quantification of the precipitated VLPs and precipitant owing to its demonstrated advantages in analyzing crude, complex mixtures. In this study, we present a Raman spectroscopy-based Process Analytical Technology (PAT) tool developed on batch and fed-batch precipitation experiments of Hepatitis B core Antigen VLPs. We conducted small-scale precipitation experiments providing a diversified data set with varying precipitation dynamics and backgrounds induced by initial dilution or spiking of clarified Escherichia coli-derived lysates. For the Raman spectroscopy data, various preprocessing operations were systematically combined allowing the identification of a preprocessing pipeline, which proved to effectively eliminate initial lysate composition variations as well as most interferences attributed to precipitates and the precipitant present in solution. The calibrated partial least squares models seamlessly predicted the precipitant concentration with R2 of 0.98 and 0.97 in batch and fed-batch experiments, respectively, and captured the observed precipitation trends with R2 of 0.74 and 0.64. Although the resolution of fine differences between experiments was limited due to the observed non-linear relationship between spectral data and the VLP concentration, this study provides a foundation for employing Raman spectroscopy as a PAT sensor for monitoring VLP precipitation processes with the potential to extend its applicability to other phase-behavior dependent processes or molecules.

“…Due to high correlation within the data set, the spectra produced by the above stated techniques are commonly processed using chemometric techniques, e.g., principal component analysis (PCA) ( Simone et al, 2014a ), partial least squares (PLS) ( Simone et al, 2014a ) regression models, or gaussian process regression (GPR) ( Schiemer et al, 2023 ), just to name a few. Further explanations of chemometric methods can be found in published literature ( Wold et al, 2001 ; Chen et al, 2007 ; Bro and Smilde, 2014 ; Acquarelli et al, 2017 ).…”

Section: Introductionmentioning

confidence: 99%

Spectroscopic insights into multi-phase protein crystallization in complex lysate using Raman spectroscopy and a particle-free bypass

Wegner,

Eming,

Walla

et al. 2024

Self Cite

Protein crystallization as opposed to well-established chromatography processes has the benefits to reduce production costs while reaching a comparable high purity. However, monitoring crystallization processes remains a challenge as the produced crystals may interfere with analytical measurements. Especially for capturing proteins from complex feedstock containing various impurities, establishing reliable process analytical technology (PAT) to monitor protein crystallization processes can be complicated. In heterogeneous mixtures, important product characteristics can be found by multivariate analysis and chemometrics, thus contributing to the development of a thorough process understanding. In this project, an analytical set-up is established combining offline analytics, on-line ultraviolet visible light (UV/Vis) spectroscopy, and in-line Raman spectroscopy to monitor a stirred-batch crystallization process with multiple phases and species being present. As an example process, the enzyme Lactobacillus kefir alcohol dehydrogenase (LkADH) was crystallized from clarified Escherichia coli (E. coli) lysate on a 300 mL scale in five distinct experiments, with the experimental conditions changing in terms of the initial lysate solution preparation method and precipitant concentration. Since UV/Vis spectroscopy is sensitive to particles, a cross-flow filtration (cross-flow filtration)-based bypass enabled the on-line analysis of the liquid phase providing information on the lysate composition regarding the nucleic acid to protein ratio. A principal component analysis (PCA) of in situ Raman spectra supported the identification of spectra and wavenumber ranges associated with productspecific information and revealed that the experiments followed a comparable, spectral trend when crystals were present. Based on preprocessed Raman spectra, a partial least squares (PLS) regression model was optimized to monitor the target molecule concentration in real-time. The off-line sample analysis provided information on the crystal number and crystal geometry by automated image analysis as well as the concentration of LkADH and host cell proteins (HCPs) In spite of a complex lysate suspension containing scattering crystals and various impurities, it was possible to monitor the target molecule concentration in a heterogeneous, multi-phase process using spectroscopic methods. With the presented analytical set-up of off-line, particle-sensitive on-line, and in-line analyzers, a crystallization capture process can be characterized better in terms of the geometry, yield, and purity of the crystals.