2023
DOI: 10.1038/s41467-023-40782-0
|View full text |Cite|
|
Sign up to set email alerts
|

DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications

Kohulan Rajan,
Henning Otto Brinkhaus,
M. Isabel Agea
et al.

Abstract: The number of publications describing chemical structures has increased steadily over the last decades. However, the majority of published chemical information is currently not available in machine-readable form in public databases. It remains a challenge to automate the process of information extraction in a way that requires less manual intervention - especially the mining of chemical structure depictions. As an open-source platform that leverages recent advancements in deep learning, computer vision, and na… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 68 publications
0
11
0
Order By: Relevance
“…2 Other publications have used the same exact prediction and the Tanimoto similarity as an additional metric. 4,7 SwinOCSR also uses BLEU and ROUGE 7 which are N-gram based precision methods developed for machine translation. OCMR has used the Levenshtein distance between the predicted SMILES string and the ground truth SMILES to quantify the dissimilarity of the predicted SMILES.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…2 Other publications have used the same exact prediction and the Tanimoto similarity as an additional metric. 4,7 SwinOCSR also uses BLEU and ROUGE 7 which are N-gram based precision methods developed for machine translation. OCMR has used the Levenshtein distance between the predicted SMILES string and the ground truth SMILES to quantify the dissimilarity of the predicted SMILES.…”
Section: Methodsmentioning
confidence: 99%
“…In contrast, the conversion of images to chemical structures (optical chemical structure recognition, OCSR) still represents a signicant challenge for established soware tools like Kekulé, CLiDE, OSRA and others described by Rajan et al 1 In the recent past, the development of articial intelligence (AI) such as transformer based machine learning tools has ignited a novel interest in image processing and the creation of generative models that lead to the rapid development of novel applications for predicting chemical structures from their images. Such AI based image-to-structure methods are Mol-Scribe 2 and RxnScribe, 3 DECIMER, 4 ReactionDataExtractor, 5 Img2Mol, 6 SwinOCSR 7 and OCMR 8 that have recently become available and that were shown to outperform previous rule based, analytical methods both in recall as well as in precision.…”
Section: Introductionmentioning
confidence: 99%
“…As the iBonD database only contains an image of each molecule, we employ the tool Deep Learning for Chemical Image Recognition software (DECIMER v. 2.0), developed by Rajan et al. [8][9][10] While DECIMER converts molecular images into SMILES, manual intervention is required to ensure the SMILES string correctly represents the molecule. Finally, to mirror the dataset by Roszak et al, 3 we also incorporate 43 heterocycles without experimental pK a values from Shen et al, leaving us with a dataset of 775 compounds.…”
Section: Methods Datasetsmentioning
confidence: 99%
“…In 2023, DECIMER.ai, an open-source platform, aimed at addressing the challenge of identifying, segmenting, and recognizing chemical structure depictions in scientific literature . The platform incorporates DECIMER Segmentation, a toolkit that utilizes Mask R-CNN25 for detecting and segmenting chemical structures.…”
Section: Related Workmentioning
confidence: 99%