Background: Quantitative structure-activity relationship (QSAR) is a computational modeling method for revealing relationships between structural properties of chemical compounds and biological activities. QSAR modeling is essential for drug discovery, but it has many constraints. Ensemble-based machine learning approaches have been used to overcome constraints and obtain reliable predictions. Ensemble learning builds a set of diversified models and combines them. However, the most prevalent approach random forest and other ensemble approaches in QSAR prediction limit their model diversity to a single subject. Results: The proposed ensemble method consistently outperformed thirteen individual models on 19 bioassay datasets and demonstrated superiority over other ensemble approaches that are limited to a single subject. The comprehensive ensemble method is publicly available at http://data.snu.ac.kr/QSAR/. Conclusions: We propose a comprehensive ensemble method that builds multi-subject diversified models and combines them through second-level meta-learning. In addition, we propose an end-to-end neural network-based individual classifier that can automatically extract sequential features from a simplified molecular-input line-entry system (SMILES). The proposed individual models did not show impressive results as a single model, but it was considered the most important predictor when combined, according to the interpretation of the meta-learning.
A molecule is a complex of heterogeneous components, and the spatial arrangements of these components determine the whole molecular properties and characteristics. With the advent of deep learning in computational chemistry, several studies have focused on how to predict molecular properties based on molecular configurations. MA message-passing neural network provides an effective framework for capturing molecular geometric features with the perspective of a molecule as a graph. However, most of these studies assumed that all heterogeneous molecular features, such as atomic charge, bond length, or other geometric features, always contribute equivalently to the target prediction, regardless of the task type. In this study, we propose a dual-branched neural network for molecular property prediction based on both the message-passing framework and standard multilayer perceptron neural networks. Our model learns heterogeneous molecular features with different scales, which are trained flexibly according to each prediction target. In addition, we introduce a discrete branch to learn single-atom features without local aggregation, apart from message-passing steps. We verify that this novel structure can improve the model performance. The proposed model outperforms other recent models with sparser representations. Our experimental results indicate that, in the chemical property prediction tasks, the diverse chemical nature of targets should be carefully considered for both model performance and generalizability. Finally, we provide the intuitive analysis between the experimental results and the chemical meaning of the target.
Purpose:Multigene assays provide useful prognostic information regarding hormone receptor (HR)-positive breast cancer. Next-generation sequencing (NGS)-based platforms have numerous advantages including reproducibility and adaptability in local laboratories. This study aimed to develop and validate an NGS-based multigene assay to predict the distant recurrence risk.Experimental Design:In total, 179 genes including 30 reference genes highly correlated with the 21-gene recurrence score (RS) algorithm were selected from public databases. Targeted RNA-sequencing was performed using 250 and 93 archived breast cancer samples with a known RS in the training and verification sets, respectively, to develop the algorithm and NGS–Prognostic Score (NGS-PS). The assay was validated in 413 independent samples with long-term follow-up data on distant metastasis.Results:In the verification set, the NGS-PS and 21-gene RS displayed 91.4% concurrence (85/93 samples). In the validation cohort of 413 samples, area under the receiver operating characteristic curve plotted using NGS-PS values classified for distant recurrence was 0.76. The best NGS-PS cut-off value predicting distant metastasis was 20. Furthermore, 269 and 144 patients were classified as low- and high-risk patients in accordance with the cut-off. Five- and 10-year estimates of distant metastasis–free survival (DMFS) for low- versus high-risk groups were 97.0% versus 77.8% and 93.2% versus 64.4%, respectively. The age-related HR for distant recurrence without chemotherapy was 9.73 (95% CI, 3.59–26.40) and 3.19 (95% CI, 1.40–7.29) for patients aged ≤50 and >50 years, respectively.Conclusions:The newly developed and validated NGS-based multigene assay can predict the distant recurrence risk in ER-positive, HER2-negative breast cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.