2011
DOI: 10.1145/1961209.1961215
|View full text |Cite
|
Sign up to set email alerts
|

Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents

Abstract: End users use chemical search engines to search for chemical formulae and chemical names. Chemical search engines identify and index chemical formulae and chemical names appearing in text documents to support efficient search and retrieval in the future. Identifying chemical formulae and chemical names in text automatically has been a hard problem that has met with varying degrees of success in the past. We propose algorithms for chemical formula and chemical name tagging using Conditional Random Fields (CRFs)… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 44 publications
0
5
0
Order By: Relevance
“…Our interest was initially in the Chemical Entity Mention (CEM) task as it is the prerequisite to the following Chemical Document Indexing (CDI) task. We started by running a distribution of ChemXSeer's formula and name extractor [ 7 - 9 ] that is released for the general public on the training and development datasets. The tagger is based on Conditional Random Fields (CRF) [ 4 ] models, with additional rules for pre and post processing documents.…”
Section: Methodsmentioning
confidence: 99%
“…Our interest was initially in the Chemical Entity Mention (CEM) task as it is the prerequisite to the following Chemical Document Indexing (CDI) task. We started by running a distribution of ChemXSeer's formula and name extractor [ 7 - 9 ] that is released for the general public on the training and development datasets. The tagger is based on Conditional Random Fields (CRF) [ 4 ] models, with additional rules for pre and post processing documents.…”
Section: Methodsmentioning
confidence: 99%
“…The model uses the concept of trees to store individual IS which was more complex in terms of memory and time. Sun et al [6] have proposed an algorithm for ranking chemical formulae and tagging chemical names in digital documents. This algorithm uses Conditional Random Fields (CRFs) and Support Vector Machines (SVMs).…”
Section: Problem Identification and Related Workmentioning
confidence: 99%
“…We, likewise, employ customised tokenisation and named entity recognition while pre-processing the corpus (Corbett and Boyle, 2018) to enable future researchers to forgo the NER step. Further related applications of NLP include information retrieval (Sun et al, 2011;Hawizy et al, 2011) and literature mining (Zaslavsky et al, 2017;Öztürk et al, 2020), while lessons learned from deep learning in NLP have been applied to strings representing chemical structures to successfully discover new potential antibiotics (Stokes et al, 2020).…”
Section: Related Workmentioning
confidence: 99%