Ensuring consistent high yields and product quality are key challenges in biomanufacturing. Even minor deviations in critical process parameters (CPPs) such as media and feed compositions can significantly affect product critical quality attributes (CQAs). To identify CPPs and their interdependencies with product yield and CQAs, design of experiments, and multivariate statistical approaches are typically used in industry. Although these models can predict the effect of CPPs on product yield, there is room to improve CQA prediction performance by capturing the complex relationships in high-dimensional data. In this regard, machine learning (ML) approaches offer immense potential in handling non-linear datasets and thus are able to identify new CPPs that could effectively predict the CQAs. ML techniques can also be synergized with mechanistic models as a ‘hybrid ML’ or ‘white box ML’ to identify how CPPs affect the product yield and quality mechanistically, thus enabling rational design and control of the bioprocess. In this review, we describe the role of statistical modeling in Quality by Design (QbD) for biomanufacturing, and provide a generic outline on how relevant ML can be used to meaningfully analyze bioprocessing datasets. We then offer our perspectives on how relevant use of ML can accelerate the implementation of systematic QbD within the biopharma 4.0 paradigm.
Background For the majority of individuals with early-onset or familial breast cancer referred for genetic testing, the genetic basis of their familial breast cancer remains unexplained. To identify novel germline variants associated with breast cancer predisposition, whole-exome sequencing (WES) was performed. Methods WES on 290 BRCA1/BRCA2-negative Singaporeans with early-onset breast cancer and/or a family history of breast cancer was done. Case–control analysis against the East-Asian subpopulation (EAS) from the Genome Aggregation Database (gnomAD) identified variants enriched in cases, which were further selected by occurrence in cancer gene databases. Variants were further evaluated in repeated case–control analyses using a second case cohort from the database of Genotypes and Phenotypes (dbGaP) comprising 466 early-onset breast cancer patients from the United States, and a Singapore SG10K_Health control cohort. Results Forty-nine breast cancer-associated germline pathogenic variants in 37 genes were identified in Singapore cases versus gnomAD (EAS). Compared against SG10K_Health controls, 13 of 49 variants remain significantly enriched (False Discovery Rate (FDR)-adjusted p < 0.05). Comparing these 49 variants in dbGaP cases against gnomAD (EAS) and SG10K_Health controls revealed 23 concordant variants that were significantly enriched (FDR-adjusted p < 0.05). Fourteen variants were consistently enriched in breast cancer cases across all comparisons (FDR-adjusted p < 0.05). Seven variants in GPRIN2, NRG1, MYO5A, CLIP1, CUX1, GNAS and MGA were confirmed by Sanger sequencing. Conclusions In conclusion, we have identified pathogenic variants in genes associated with breast cancer predisposition. Importantly, many of these variants were significant in a second case cohort from dbGaP, suggesting that the strategy of using case–control analysis to select variants could potentially be utilized for identifying variants associated with cancer susceptibility.
The current understanding of genetic susceptibility factors for nasopharyngeal carcinoma (NPC) is still incomplete. To identify novel germline variants associated with NPC predisposition, we analysed whole-exome sequencing data from 119 NPC patients from Singapore with a family history of NPC and/or with early-onset NPC, together with 1337 Singaporean participants without NPC. Variants were prioritised and filtered by selecting variants with minor allele frequencies of <1% in both local control (n = 1337) and gnomAD non-cancer (EAS) (n = 9626) cohorts and a high pathogenicity prediction (CADD score > 20). Using single-variant testing, we identified 17 rare pathogenic variants in 17 genes that were associated with NPC. Consistent evidence of enrichment in NPC patients was observed for five of these variants (in JAK2, PRDM16, LRP1B, NIN, and NKX2-1) from an independent case-control comparison of 156 NPC patients and 9770 unaffected individuals. In a family with five siblings, a FANCE variant (p. P445S) was detected in two affected members, but not in three unaffected members. Gene-based burden testing recapitulated variants in NKX2-1 and FANCE as being associated with NPC risk. Using pathway analysis, endocytosis and immune-modulating pathways were found to be enriched for mutation burden. This study has identified NPC-predisposing variants and genes which could shed new insights into the genetic predisposition of NPC.
Introduction There is limited diversity in current hereditary multi-gene testing data-sets, leading to greater challenges in accurate variant classification. To address this, a collaboration between 3 Singapore-based healthcare organizations (Tan Tock Seng Hospital, National Cancer Centre Singapore and Lucence Diagnostics) pooled results from patients of East Asian, Southeast Asian and South Asian ancestry, representing over 60% of the world's population. We hypothesized that a multi-ethnic collaboration would provide deeper understanding of cancer predisposition genes, particularly in terms of novel variants. Methods A total of 704 cancer patients of multi-ethnic Asian (SE Asian, East Asian and South Asian) ethnicity with either a history of breast, ovarian, pancreatic or prostate cancer, were tested with multi-gene panels. All patients were tested by multi-gene testing, including but not limited to BRCA1/2, PALB2 and ATM. In addition to genetic testing, the family histories of the patients were collected. All variants were classified by ACMG criteria. Chi-square testing was used for statistical analysis. Results Three sites (TTSH, NCCS, and Lucence) pooled patients selected for breast (n=458), ovarian (n=176), pancreatic (n=61) and prostate cancer (n=25). The mean and median ages were 44.9 years and 43 years, respectively. Of the 704 patients, 209 had a history of cancer in their first degree relatives, 432 did not and 63 patients did not know of any cancers in their first degree relatives. 122 of 704 patients (17.33%) had pathogenic/likely pathogenic variants in any tested risk-related gene, of which 86 were in BRCA1/2, 11 in PALB2 and 4 in ATM. Some variants were detected in more than one patient. 212 of 704 patients (30.1%) had VUSs detected in any tested risk-related gene, with 32.1% in BRCA1/2. There was a positive association between multiple-cancer status and pathogenic variants (9/22 vs 113/682, p = 0.007). Most notably, among the unique pathogenic/likely pathogenic variants in BRCA1, BRCA2, PALB2 and ATM, 10.64% (5/47), 12.9% (4/31), 30% (3/10) and 25% (1/4) respectively were novel variants, not previously reported in ClinVar (Dec 2019). Conclusion To our knowledge, this is the largest regional multi-ethnic cohort of patients with breast, ovarian, pancreatic and prostate cancer undergoing comprehensive genetic testing. Only one third of patients reported a first-degree family history suggesting that testing ought to be performed if clinical suspicion is high. Notably, 14.1% of BRCA1/BRCA2/PALB2/ATM pathogenic/likely pathogenic variants detected in our cohort were novel variants, not hitherto published in ClinVar. In conclusion, this collaboration demonstrates that testing of Asian patients can enrich global understanding of cancer predisposition gene mutations. This will improve cancer prevention, surveillance, and treatment selection for cancer patients, such as the use of PARP inhibitors for genetic defects of DNA repair. Citation Format: Jens Samol, Wei Lim Chia, Liuh Ling Goh, Matthew Myint, Min-Han Tan, Ru Jin Tay, Hao Chen, Yukti Choudhury, Ann SG Lee. Germline homologous recombination deficiency pathway defects in a multi-ethnic East Asian, Southeast Asian and South Asian cancer patient cohort [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 4600.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.