2023
DOI: 10.1021/acs.jcim.3c00607
|View full text |Cite
|
Sign up to set email alerts
|

Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data

Abstract: The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets, in order to advance the role of AI in the field at scale, there must be significant improvements in the reporting of reaction data. Currently, the majority of publicly available data is reported in an unstructured format and heavily… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(13 citation statements)
references
References 65 publications
0
7
0
Order By: Relevance
“…Addressing these challenges necessitates enhanced dataset quality and size, and standardized data reporting. 84,86,87 This comprehensive understanding highlights the pivotal role of data quality, underscores the of standardized reporting practices, and emphasizes the collective effort required to enhance dataset quality and modeling techniques, ultimately unlocking the full potential of ML in driving innovation within the OPV and broader chemistry domains.…”
Section: Discussionmentioning
confidence: 98%
See 1 more Smart Citation
“…Addressing these challenges necessitates enhanced dataset quality and size, and standardized data reporting. 84,86,87 This comprehensive understanding highlights the pivotal role of data quality, underscores the of standardized reporting practices, and emphasizes the collective effort required to enhance dataset quality and modeling techniques, ultimately unlocking the full potential of ML in driving innovation within the OPV and broader chemistry domains.…”
Section: Discussionmentioning
confidence: 98%
“…Both domains face challenges involving two molecular structures as input features, processing variables, sparse data, non-smooth response surfaces, and biases in reported data, notably the underreporting of negative results. 69,[82][83][84][85] Acknowledging these parallels emphasizes the broader challenges in predictive modeling within chemistry-driven domains. Addressing these challenges necessitates enhanced dataset quality and size, and standardized data reporting.…”
Section: Discussionmentioning
confidence: 99%
“…Both domains face challenges involving two molecular structures as input features, processing variables, sparse data, non-smooth response surfaces, and biases in reported data, notably the underreporting of negative results. 69,[82][83][84][85] Acknowledging these parallels emphasizes the broader challenges in predictive modeling within chemistry-driven domains. Addressing these challenges necessitates enhanced dataset quality and size, and standardized data reporting.…”
Section: Discussionmentioning
confidence: 99%
“…The size of the corpus of chemical reaction data obtained from public, proprietary, and licensed sources has progressively grown over the years, matched by an increasing demand for extracting more value from chemical experiments. , As a result, scientists are increasingly relying on computational tools for effective data navigation and analysis. For example, the Network of Organic Chemistry (NOC) , approach converts individual chemical reactions into a graph-like object, enabling powerful graph-based searches and facilitating the discovery of novel synthetic routes. , Additionally, advancements in predictive modeling has led to the development of Computer-Aided Synthesis Planning (CASP) tools, which provide actionable insight in the form of synthetic plans to molecular targets.…”
Section: Introductionmentioning
confidence: 99%