Although the therapeutic efficacy and commercial success of monoclonal antibodies (mAbs) are tremendous, the design and discovery of new candidates remain a time and cost-intensive endeavor. In this regard, progress in the generation of data describing antigen binding and developability, computational methodology, and artificial intelligence may pave the way for a new era of in silico on-demand immunotherapeutics design and discovery. Here, we argue that the main necessary machine learning (ML) components for an in silico mAb sequence generator are: understanding of the rules of mAb-antigen binding, capacity to modularly combine mAb design parameters, and algorithms for unconstrained parameter-driven in silico mAb sequence synthesis. We review the current progress toward the realization of these necessary components and discuss the challenges that must be overcome to allow the on-demand ML-based discovery and design of fit-for-purpose mAb therapeutic candidates.
Background Lung cancer is the leading cause of the largest number of deaths worldwide and lung adenocarcinoma is the most common form of lung cancer. In order to understand the molecular basis of lung adenocarcinoma, integrative analysis have been performed by using genomics, transcriptomics, epigenomics, proteomics and clinical data. Besides, molecular prognostic signatures have been generated for lung adenocarcinoma by using gene expression levels in tumor samples. However, we need signatures including different types of molecular data, even cohort or patient-based biomarkers which are the candidates of molecular targeting. Results We built an R pipeline to carry out an integrated meta-analysis of the genomic alterations including single-nucleotide variations and the copy number variations, transcriptomics variations through RNA-seq and clinical data of patients with lung adenocarcinoma in The Cancer Genome Atlas project. We integrated significant genes including single-nucleotide variations or the copy number variations, differentially expressed genes and those in active subnetworks to construct a prognosis signature. Cox proportional hazards model with Lasso penalty and LOOCV was used to identify best gene signature among different gene categories. We determined a 12-gene signature (BCHE, CCNA1, CYP24A1, DEPTOR, MASP2, MGLL, MYO1A, PODXL2, RAPGEF3, SGK2, TNNI2, ZBTB16) for prognostic risk prediction based on overall survival time of the patients with lung adenocarcinoma. The patients in both training and test data were clustered into high-risk and low-risk groups by using risk scores of the patients calculated based on selected gene signature. The overall survival probability of these risk groups was highly significantly different for both training and test datasets. Conclusions This 12-gene signature could predict the prognostic risk of the patients with lung adenocarcinoma in TCGA and they are potential predictors for the survival-based risk clustering of the patients with lung adenocarcinoma. These genes can be used to cluster patients based on molecular nature and the best candidates of drugs for the patient clusters can be proposed. These genes also have a high potential for targeted cancer therapy of patients with lung adenocarcinoma.
Lung cancer is the second most frequently diagnosed cancer type and responsible for the highest number of cancer deaths worldwide. Lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) are subtypes of non-small-cell lung cancer which has the highest frequency of lung cancer cases. We aimed to analyze genomic and transcriptomic variations including simple nucleotide variations (SNVs), copy number variations (CNVs) and differential expressed genes (DEGs) in order to find key genes and pathways for diagnostic and prognostic prediction for lung adenocarcinoma and lung squamous cell carcinoma. We performed a univariate Cox model and then lasso-regularized Cox model with leave-one-out cross-validation using The Cancer Genome Atlas (TCGA) gene expression data in tumor samples. We generated 35- and 33-gene signatures for prognostic risk prediction based on the overall survival time of the patients with LUAD and LUSC, respectively. When we clustered patients into high- and low-risk groups, the survival analysis showed highly significant results with high prediction power for both training and test datasets. Then, we characterized the differences including significant SNVs, CNVs, DEGs, active subnetworks, and the pathways. We described the results for the risk groups and cancer subtypes separately to identify specific genomic alterations between both high-risk groups and cancer subtypes. Both LUAD and LUSC high-risk groups have more downregulated immune pathways and upregulated metabolic pathways. On the other hand, low-risk groups have both up- and downregulated genes on cancer-related pathways. Both LUAD and LUSC have important gene alterations such as CDKN2A and CDKN2B deletions with different frequencies. SOX2 amplification occurs in LUSC and PSMD4 amplification in LUAD. EGFR and KRAS mutations are mutually exclusive in LUAD samples. EGFR, MGA, SMARCA4, ATM, RBM10, and KDM5C genes are mutated only in LUAD but not in LUSC. CDKN2A, PTEN, and HRAS genes are mutated only in LUSC samples. The low-risk groups of both LUAD and LUSC tend to have a higher number of SNVs, CNVs, and DEGs. The signature genes and altered genes have the potential to be used as diagnostic and prognostic biomarkers for personalized oncology.
IRF6, a member of Interferon Regulatory Factors (IRF) family, is involved in orofacial and epidermal development. In breast cancer cell lines ectopic expression of IRF6 reduces cell numbers suggesting a role as negative regulator of cell cycle. IRF6 is a direct target of canonical Notch signaling in keratinocyte differentiation. Notch is involved in luminal cell fate determination and stem cell regulation in the normal breast and is implicated as an oncogene in breast cancer. Notch activation is sufficient to induce proliferation and transformation in non-tumorigenic breast epithelial cell line, MCF10A. ΔNp63, which is downregulated by Notch activation in the breast, regulates IRF6 expression in keratinocytes. In this report, we investigate Notch-IRF6 and ΔNp63-IRF6 interactions in MCF10A and MDA MB 231 cells. We observed that in these cells, IRF6 expression is partially regulated by canonical Notch signaling and ΔNp63 downregulation. Furthermore, we demonstrate that IRF6 abrogation impairs Notch-induced proliferation and transformation in MCF10A cells. Thus, we confirm the previous findings by showing a tissue independent regulation of IRF6 by Notch signaling, and extend them by proposing a context dependent role for IRF6, which acts as a positive regulator of proliferation and transformation in MCF10A cells downstream of Notch signaling.
Lung cancer is the second frequently diagnosed cancer type and responsible for the highest number of cancer deaths worldwide. Lung adenocarcinoma and lung squamous cell carcinoma are subtypes of non-small cell lung cancer which has the highest frequency of lung cancer cases. We aimed to analyze genomic and transcriptomic variations including simple nucleotide variations (SNVs), copy number variations (CNVs) and differential expressed genes (DEGs) in order to find key genes and pathways for diagnostic and prognostic prediction for lung adenocarcinoma and lung squamous cell carcinoma. We performed univariate cox model and then lasso regularized cox model with leave-one-out cross-validation using TCGA gene expression data in tumor samples. We generated a 35-gene signature and a 33-gene signature for prognostic risk prediction based on the overall survival time of the patients with LUAD and LUSC, respectively. When we clustered patients into high-risk and low-risk groups, the survival analysis showed highly significant results with high prediction power for both training and test datasets. Then we characterized the differences including significant SNVs, CNVs, DEGs, active subnetworks, and the pathways. We described the results for the risk groups and cancer subtypes separately to identify specific genomic alterations between both high-risk groups and cancer subtypes. Both LUAD and LUSC high-risk groups have more down-regulated immune pathways and upregulated metabolic pathways. On the other hand, low-risk groups have both upregulated and downregulated genes on cancer-related pathways. Both LUAD and LUSC have important gene alterations such as CDKN2A and CDKN2B deletions with different frequencies. SOX2 amplification occurs in LUSC and PSMD4 amplification in LUAD. EGFR and KRAS mutations are mutually exclusive in LUAD samples. EGFR, MGA, SMARCA4, ATM, RBM10, and KDM5C genes are mutated only in LUAD but not in LUSC. CDKN2A, PTEN, and HRAS genes are mutated only in LUSC samples. Low-risk groups of both LUAD and LUSC, tend to have a higher number of SNVs, CNVs, and DEGs. The signature genes and altered genes have the potential to be used as diagnostic and prognostic biomarkers for personalized oncology.
Proteomics is the large-scale analysis of proteins, contributing for understanding of gene function. Functional genomics, proteomics, and even metabolomics are the footsteps of genomics that are useful tool to expand of our knowledge on the biological hierarchy of the transcription, translation, and production of small molecules. However, proteomics is a method for assessing the wide range of information such as the structure, expression, localization, biochemical activity, interactions, posttranslational modifications and cellular roles of proteins following protein isolation, digestion and mass spectrometry. Proteomics, as a significant post-genomic tool in the field of science, allows researchers to decipher underlying molecular mechanisms behind different metabolic pathways. Proteomics studies are mostly based on protein identification as using mainly bottom-up approaches such as DDA or MudPIT methods as examples of shotgun proteomics techniques. By using the high throughput mass spectrometer technology, huge output data of peptide spectra has been generated.
Background: Predicting lung adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC) risk cohorts is a crucial step in precision oncology. Currently, clinicians and patients are informed about the patient's risk group via staging in the clinic. Several machine learning approaches have been carried out on the stratification of LUAD and LUSC patients, but there is no study assessing the integrated training of both clinical data and genetic data of these two lung cancer types. Methods: We initially implemented five different machine learning algorithms (Support Vector Machine, Logistic Regression, Naive Bayes, Random Forest, and K Neighbors Classifiers) to evaluate the clinical and mutated genes of patients to develop a prognostic relevance model to classify LUAD and LUSC patients into high-risk and low-risk groups. Results: We identified a list of clinical features and somatically mutated genes that may be used to evaluate the prognosis of LUAD and LUSC patients for patient risk stratification in a clinical setting. As a result of this analysis, new genes such as KEAP1 for LUAD and CSMD3 for LUSC with others can be added to clinical decision processes. Conclusions: In current clinical practice, clinicians, and patients are informed about the patient's risk group only with cancer staging. With the feature set we propose, clinicians and patients can assess the risk group of their patients according to the patient-specific clinical and molecular parameters. Our machine learning model may serve as a practical and reliable prognosis predictive tool for LUAD and LUSC and could provide novel insights into the understanding of the underlying clinical and molecular mechanisms of LUAD and LUSC. Keywords: Machine Learning, Lung Adenocarcinoma, Lung Squamous Cell Carcinoma, Prognosis Prediction Model, TCGA, Multi-omics, Data Integration
Predicting lung adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC) risk status is a crucial step in precision oncology. In current clinical practice, clinicians, and patients are informed about the patient's risk group only with cancer staging. Several machine learning approaches for stratifying LUAD and LUSC patients have recently been described, however, there has yet to be a study that compares the integrated modeling of clinical and genetic data from these two lung cancer types. In our work, we used a prognostic prediction model based on clinical and somatically altered gene features from 1026 patients to assess the relevance of features based on their impact on risk classification. By integrating the clinical features and somatically mutated genes of patients, we achieved the highest accuracy; 93% for LUAD and 89% for LUSC, respectively. Our second finding is that new prognostic genes such as KEAP1 for LUAD and CSMD3 for LUSC and new clinical factors such as the site of resection are significantly associated with the risk stratification and can be integrated into clinical decision making. We validated the most important features found on an independent RNAseq dataset from NCBI GEO with survival information (GSE81089) and integrated our model into a user-friendly mobile application. Using this machine learning model and mobile application, clinicians and patients can assess the survival risk of their patients using each patient’s own clinical and molecular feature set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.