2023
DOI: 10.1021/acs.chemrev.3c00189
|View full text |Cite
|
Sign up to set email alerts
|

Machine Learning Methods for Small Data Challenges in Molecular Science

Abstract: Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
21
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 94 publications
(34 citation statements)
references
References 581 publications
0
21
0
Order By: Relevance
“…The applications in image analysis, biological information clustering and extraction, drug discovery, and disease prediction provide great inspiration for the processing of MSI data. In recent years, the vigorous development of proteomics, lipomics, and metabolomics is also attributed to the continuous advancement of machine learning algorithms. Nevertheless, the research mainly focuses on tissue sections or single cells detached from the tissue environment due to the spatial resolution of micrometers. For instance, Caroline R. Bartman et al measured the TCA flux and ATP production in solid tumors and they discovered the obvious difference between primary and metastatic solid tumors .…”
Section: Concluding Remarks and Perspectivementioning
confidence: 99%
“…The applications in image analysis, biological information clustering and extraction, drug discovery, and disease prediction provide great inspiration for the processing of MSI data. In recent years, the vigorous development of proteomics, lipomics, and metabolomics is also attributed to the continuous advancement of machine learning algorithms. Nevertheless, the research mainly focuses on tissue sections or single cells detached from the tissue environment due to the spatial resolution of micrometers. For instance, Caroline R. Bartman et al measured the TCA flux and ATP production in solid tumors and they discovered the obvious difference between primary and metastatic solid tumors .…”
Section: Concluding Remarks and Perspectivementioning
confidence: 99%
“…As computational tools at all stages of drug discovery and development have been extensively reviewed recently, , in this perspective, we focus on the recent progress related to our group. Specifically, we first summarize protein pocket identification and analysis methods based on fragment-centric topographical mapping.…”
Section: Introductionmentioning
confidence: 99%
“…49−60 The accurate prediction of molecular properties requires models to learn more robust molecular representations for 1D SMILES, 61 2D graph, 62 and 3D geometry. 63 As computational tools at all stages of drug discovery and development have been extensively reviewed recently, 64,65 in this perspective, we focus on the recent progress related to our group. Specifically, we first summarize protein pocket identification and analysis methods based on fragment-centric topographical mapping.…”
Section: ■ Introductionmentioning
confidence: 99%
“…It is then not surprising that the recent explosive development of machine learning (ML) techniques, including deep neural networks (DNNs), is already making a noticeable impact on this field, including works aimed directly at improving the accuracy of the description of complex solvation effects. It should be noted that the majority of these recent works combine quantum mechanics (QM)-based methodology with ML, while our interest here is purely classical approaches. Among recent purely ML-based approaches, a featurization algorithm, based on functional class fingerprints and implemented within the DeepChem ML framework, was used in ref to predict the hydration-free energies (HFEs) of a diverse set of 642 neutral small molecules available in FreeSolvarguably the largest public database of experimentally measured HFEs.…”
Section: Introductionmentioning
confidence: 99%