With advances in technologies in the past decade, the amount of data generated and recorded has grown enormously in virtually all fields of industry and science. This extraordinary amount of data provides unprecedented opportunities for data-driven decision-making and knowledge discovery. However, the task of analyzing such large-scale dataset poses significant challenges and calls for innovative statistical methods specifically designed for faster speed and higher efficiency. In this chapter, we review currently available methods for big data, with a focus on the subsampling methods using statistical leveraging and divide and conquer methods.
Deep generative models applied to the generation of novel compounds in small-molecule drug design have attracted a lot of attention in recent years. To design compounds that interact with specific target proteins, we propose a Generative Pre-Trained Transformer (GPT)-inspired model for de novo target-specific molecular design. By implementing different keys and values for the multi-head attention conditional on a specified target, the proposed method can generate drug-like compounds both with and without a specific target. The results show that our approach (cMolGPT) is capable of generating SMILES strings that correspond to both drug-like and active compounds. Moreover, the compounds generated from the conditional model closely match the chemical space of real target-specific molecules and cover a significant portion of novel compounds. Thus, the proposed Conditional Generative Pre-Trained Transformer (cMolGPT) is a valuable tool for de novo molecule design and has the potential to accelerate the molecular optimization cycle time.
Purpose Obstructive sleep apnea (OSA) results in systemic intermittent hypoxia. By one model, hypoxic stress signaling in OSA patients alters the levels of inflammatory soluble cytokines TNF and IL6, damages the blood brain barrier, and activates microglial targeting of neuronal cell death to increase the risk of neurodegenerative disorders and other diseases. However, it is not yet clear if OSA significantly alters the levels of the soluble isoforms of TNF receptors TNFR1 and TNFR2 and IL6 receptor (IL6R) and co-receptor gp130, which have the potential to modulate TNF and IL6 signaling. Methods Picogram per milliliter levels of the soluble isoforms of these four cytokine receptors were estimated in OSA patients, in OSA patients receiving airways therapy, and in healthy control subjects. Triplicate samples were examined using Bio-Plex fluorescent bead microfluidic technology. The statistical significance of cytokine data was estimated using the nonparametric Wilcoxon rank-sum test. The clustering of these high-dimensional data was visualized using t-distributed stochastic neighbor embedding (t-SNE). Results OSA patients had significant twofold to sevenfold reductions in the soluble serum isoforms of all four cytokine receptors, gp130, IL6R, TNFR1, and TNFR2, as compared with control individuals (p = 1.8 × 10−13 to 4 × 10−8). Relative to untreated OSA patients, airways therapy of OSA patients had significantly higher levels of gp130 (p = 2.8 × 10−13), IL6R (p = 1.1 × 10−9), TNFR1 (p = 2.5 × 10−10), and TNFR2 (p = 5.7 × 10−9), levels indistinguishable from controls (p = 0.29 to 0.95). The data for most airway-treated patients clustered with healthy controls, but the data for a few airway-treated patients clustered with apneic patients. Conclusions Patients with OSA have aberrantly low levels of four soluble cytokine receptors associated with neurodegenerative disease, gp130, IL6R, TNFR1, and TNFR2. Most OSA patients receiving airways therapy have receptor levels indistinguishable from healthy controls, suggesting a chronic intermittent hypoxia may be one of the factors contributing to low receptor levels in untreated OSA patients.
Complex and massive datasets can be easily accessed using the newly developed data acquisition technology. In spite of the fact that the smoothing spline ANOVA models have proven to be useful in a variety of fields, these datasets impose the challenges on the applications of the models. In this chapter, we present a selected review of the smoothing spline ANOVA models and highlight some challenges and opportunities in massive datasets. We review two approaches to significantly reduce the computational costs of fitting the model. One real case study is used to illustrate the performance of the reviewed methods.
Rheological properties of food materials are important as they influence food texture, processing properties, and stability. Rotational rheometry has been widely used for measuring rheological properties. However, the measurements obtained using different geometries and rheometers are generally not compared for precision and accuracy, so it is difficult to compare data across different studies. In this study, nine rheometers from seven laboratories were used to measure the viscosity and viscoelastic properties of a commercial salad dressing. The measurements were obtained at three temperatures (8, 25, and 60 °C) using different diameter parallel plates (20, 40, 50, and 60 mm). Generally, the viscosity measurements among rheometers differed significantly (P<0.05). For larger geometry diameter (40, 50, and 60 mm) and at lower temperatures (8 °C), viscosity measurements at lower shear rate (0.01, 0.1, and 1.0 s−1) were significantly different. Rheometer brand significantly affected storage modulus only at low (0.01%) and high levels (10% and 100%) of strain. Temperature was an influencing factor on viscoelastic behaviors only at high strain (>10%). Storage moduli values obtained by frequency sweeps were not affected by rheometer or plate diameter. Overall, rheometer, geometry, and temperature can influence rheological measurements and care should be taken when comparing data across laboratories or published works. Higher shear rates (≥10 s−1) and moderate strains (0.1% to 10%) generally provide more repeatable data among different laboratories. Practical Application This study provides information on what factors may potentially influence rheological measurements conducted across different laboratories. It is useful for rheometer users who want to compare their experimental data to published data or compare two sets of published data. It is better to compare data collected at shear rates 10 s−1 and strains between 0.1% and 1.0%.
Background Protein kinases are a large family of druggable proteins that are genomically and proteomically altered in many human cancers. Kinase-targeted drugs are emerging as promising avenues for personalized medicine because of the differential response shown by altered kinases to drug treatment in patients and cell-based assays. However, an incomplete understanding of the relationships connecting genome, proteome and drug sensitivity profiles present a major bottleneck in targeting kinases for personalized medicine. Results In this study, we propose a multi-component Quantitative Structure–Mutation–Activity Relationship Tests (QSMART) model and neural networks framework for providing explainable models of protein kinase inhibition and drug response ($$\hbox {IC}_{50}$$ IC 50 ) profiles in cell lines. Using non-small cell lung cancer as a case study, we show that interaction terms that capture associations between drugs, pathways, and mutant kinases quantitatively contribute to the response of two EGFR inhibitors (afatinib and lapatinib). In particular, protein–protein interactions associated with the JNK apoptotic pathway, associations between lung development and axon extension, and interaction terms connecting drug substructures and the volume/charge of mutant residues at specific structural locations contribute significantly to the observed $$\hbox {IC}_{50}$$ IC 50 values in cell-based assays. Conclusions By integrating multi-omics data in the QSMART model, we not only predict drug responses in cancer cell lines with high accuracy but also identify features and explainable interaction terms contributing to the accuracy. Although we have tested our multi-component explainable framework on protein kinase inhibitors, it can be extended across the proteome to investigate the complex relationships connecting genotypes and drug sensitivity profiles.
Predicting drug sensitivity profiles from genotypes is a major challenge in personalized medicine. Machine learning and deep neural network methods have shown promise in addressing this challenge, but the "black-box" nature of these methods precludes a mechanistic understanding of how and which genomic and proteomic features contribute to the observed drug sensitivity profiles. Here we provide a combination of statistical and neural network framework that not only estimates drug IC 50 in cancer cell lines with high accuracy (R 2 = 0.861 and RMSE = 0.818) but also identifies features contributing to the accuracy, thereby enhancing explainability. Our framework, termed QSMART, uses a multi-component approach that includes (1) collecting drug fingerprints, cancer cell line's multi-omics features, and drug responses, (2) testing the statistical significance of interaction terms, (3) selecting features by Lasso with Bayesian information criterion, and (4) using neural networks to predict drug response. We evaluate the contribution of each of these components and use a case study to explain the biological relevance of several selected features to protein kinase inhibitor response in non-small cell lung cancer cells. Specifically, we illustrate how interaction terms that capture associations between drugs and mutant kinases quantitatively contribute to the response of two EGFR inhibitors (afatinib and lapatinib) in non-small cell lung cancer cells. Although we have tested QSMART on protein kinase inhibitors, it can be extended across the proteome to investigate the complex relationships connecting genotypes and drug sensitivity profiles. Introduction 1 Protein kinases are a class of signaling proteins, greatly valued as therapeutic targets for 2 their key roles in human diseases, such as cancer [1]. For decades, chemotherapy has 3 served as part of a standard set of cancer treatments; however, the resistance of cancer 4 cells to chemotherapy is still a major clinical challenge [2]. Mutations in protein kinase 5 are known to play important roles not only in drug resistance [3] but also in drug 6 sensitivity [4]. Depending on the structural location, mutations can have varying 7 impacts on drug sensitivity. For example, non-small cell lung cancer (NSCLC) cells 8 December 28, 2019 1/25 harboring either the EGFR T790M or L858R mutation respectively leads to resistance 9 or hypersensitivity to the cancer drug gefitinib [5, 6], while those with EGFR 10 T790M/L858R double mutant are only resistant to gefitinib [7]. As mutations impact 11 the efficacy of different cancer drugs, there is a need to incorporate structural 12 knowledge in drug response prediction methods. 13 To facilitate the understanding of the molecular mechanisms that cause drug 14 sensitivity and drug resistance in cancer cells, the Genomics of Drug Sensitivity in 15 Cancer (GDSC) Project [8] recently screened the drug responses of 266 anti-cancer 16 drugs against ∼1,000 human cancer cell lines and provided the largest publicly available 17 drug response datas...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.