Machine Learning-Based Models with High Accuracy and Broad Applicability Domains for Screening PMT/vPvM Substances

Yu, Yang; Gao, Yuchen; Shen, Lilai; Cui, Shixuan; Gou, Yiyuan; Zhang, Chunlong; Zhuang, Shulin; Jiang, Guibin

doi:10.1021/acs.est.2c06155

Cited by 19 publications

(20 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The broad chemical space of the dataset is critical for the ML modeling and can widen AD of the classifiers, as can also be concluded from a previous study. 18 The number of compounds in most of the communities also increased after the data incorporation. It can be perceived that the more the communities in a dataset, the more the compounds in communities, the more the ML models can learn, and the higher performance the models have.…”

Section: ■ Results and Discussionmentioning

confidence: 93%

“…For the 22 classifiers, the differences (δ) between A ROC on the training set and those on the test set range from 0.027 to 0.076 (Table S3). It can be concluded from previous studies that if δ/ A ROC ≤ 10% ( A ROC on the training set), the corresponding classifiers are free of overfitting. ,, For the 22 classifiers, the δ/ A ROC values are all ≤8%. Therefore, there is scarcely any overfitting in the constructed ML classifiers.…”

Section: Resultsmentioning

confidence: 96%

“…set), the corresponding classifiers are free of overfitting. 18,57,58 For the 22 classifiers, the δ/A ROC values are all ≤8%. Therefore, there is scarcely any overfitting in the constructed ML classifiers.…”

Section: ■ Results and Discussionmentioning

confidence: 99%

“…Quantitative structure−activity relationship (QSAR) models, as well-established representatives in computational toxicology, 17,18 have been proven to be successful in screening hazardous chemicals. 19−21 There have been previous efforts to develop machine learning (ML)-based QSAR models for screening TSHR agonists.…”

Section: ■ Introductionmentioning

confidence: 99%

“…Quantitative structure–activity relationship (QSAR) models, as well-established representatives in computational toxicology, , have been proven to be successful in screening hazardous chemicals. − There have been previous efforts to develop machine learning (ML)-based QSAR models for screening TSHR agonists. − However, the previous models were trained with either small datasets or severely imbalanced datasets of classifications. It is known that ML-based QSAR models built on small datasets can be unreliable, and the learned knowledge is fragile .…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Machine Learning Model for Screening Thyroid Stimulating Hormone Receptor Agonists Based on Updated Datasets and Improved Applicability Domain Metrics

Liu

Wang

Chen

et al. 2023

Chem. Res. Toxicol.

View full text Add to dashboard Cite

Machine learning (ML) models for screening endocrine-disrupting chemicals (EDCs), such as thyroid stimulating hormone receptor (TSHR) agonists, are essential for sound management of chemicals. Previous models for screening TSHR agonists were built on imbalanced datasets and lacked applicability domain (AD) characterization essential for regulatory application. Herein, an updated TSHR agonist dataset was built, for which the ratio of active to inactive compounds greatly increased to 1:2.6, and chemical spaces of structure–activity landscapes (SALs) were enhanced. Resulting models based on 7 molecular representations and 4 ML algorithms were proven to outperform previous ones. Weighted similarity density (ρs) and weighted inconsistency of activities (I A) were proposed to characterize the SALs, and a state-of-the-art AD characterization methodology ADSAL{ρs, I A} was established. An optimal classifier developed with PubChem fingerprints and the random forest algorithm, coupled with ADSAL{ρs ≥ 0.15, I A ≤ 0.65}, exhibited good performance on the validation set with the area under the receiver operating characteristic curve being 0.984 and balanced accuracy being 0.941 and identified 90 TSHR agonist classes that could not be found previously. The classifier together with the ADSAL{ρs, I A} may serve as efficient tools for screening EDCs, and the AD characterization methodology may be applied to other ML models.

show abstract

Section: ■ Results and Discussionmentioning

confidence: 93%

Section: Resultsmentioning

confidence: 96%

Section: ■ Results and Discussionmentioning

confidence: 99%

Section: ■ Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Machine Learning Model for Screening Thyroid Stimulating Hormone Receptor Agonists Based on Updated Datasets and Improved Applicability Domain Metrics

Liu

Wang

Chen

et al. 2023

Chem. Res. Toxicol.

View full text Add to dashboard Cite

show abstract

MatGPT: A Vane of Materials Informatics from Past, Present, to Future

Wang,

Chen,

Tao

et al. 2023

Advanced Materials

View full text Add to dashboard Cite

Combining materials science, artificial intelligence (AI), physical chemistry and other disciplines, materials informatics is continuously accelerating the vigorous development of new materials. The emergence of “GPT AI” shows that the scientific research field has entered the era of intelligent civilization with “data” as the basic factor and “algorithm + computing power” as the core productivity. The continuous innovation of AI will impact the cognitive laws and scientific methods, and reconstruct the knowledge and wisdom system. This leads us to think more about materials informatics, both in terms of opportunities and challenges. In this review, we provide a comprehensive discussion of AI models, materials infrastructures, and review the current advances in the discovery and design of new materials. With the rise of new research paradigms triggered by “AI for Science”, we propose the vane of materials informatics: “MatGPT”, and carry out the technical path planning from the aspects of data, descriptors, generative models, pre‐training models, directed design models, collaborative training, experimental robots, as well as the efforts and preparations needed to develop a new generation of materials informatics. Finally, we discuss the challenges and constraints faced by materials informatics, in order to achieve a more digital, intelligent and automated construction of materials informatics with the joint efforts of more interdisciplinary scientists. This article is protected by copyright. All rights reserved

show abstract

Advances and applications of machine learning and deep learning in environmental ecology and health

Cui

Gao

Huang

et al. 2023

Environmental Pollution

View full text Add to dashboard Cite

Machine Learning-Based Models with High Accuracy and Broad Applicability Domains for Screening PMT/vPvM Substances

Cited by 19 publications

References 46 publications

Machine Learning Model for Screening Thyroid Stimulating Hormone Receptor Agonists Based on Updated Datasets and Improved Applicability Domain Metrics

Machine Learning Model for Screening Thyroid Stimulating Hormone Receptor Agonists Based on Updated Datasets and Improved Applicability Domain Metrics

MatGPT: A Vane of Materials Informatics from Past, Present, to Future

Advances and applications of machine learning and deep learning in environmental ecology and health

Contact Info

Product

Resources

About