2016
DOI: 10.1093/database/baw135
|View full text |Cite
|
Sign up to set email alerts
|

NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition

Abstract: Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates (1) class composition, which is used for combining chemical classes whose naming conventions are similar; (2) BioNE features, which are used for distinguishing chemical me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…Refer to the practice of Tsai et al [31], we employ the GENIA Tagger [32] to process input documents, including tokenization, POS tagging and chunking. All of these provide features for our BiLSTM-CRF model to further enrich the information of each word.…”
Section: Feature Extractionmentioning
confidence: 99%
“…Refer to the practice of Tsai et al [31], we employ the GENIA Tagger [32] to process input documents, including tokenization, POS tagging and chunking. All of these provide features for our BiLSTM-CRF model to further enrich the information of each word.…”
Section: Feature Extractionmentioning
confidence: 99%
“…The statistical principle-based approach is used to identify protein mentions and achieved the highest score in terms of the second evaluation metric of the BioCreative V.5 Gene and protein related object recognition (GPRO) task (20). The CRF-based NERChem (21) is used to identify chemical mentions. Finally, the dictionary-based approach is used to recognize disease and biological process mentions by using external dictionaries including Entrez, ChEBI and BEL official dictionaries, which are also used to normalize each recognized NE mention to its database identifier.…”
Section: Methodsmentioning
confidence: 99%
“…Thus, we find that GPRO mentions were usually substrings of SPBA’s NEs. To identify GPRO mentions, we employ our previous chemical name recognizer, NERChem [17], which bases on the CRF model. Firstly, we employ the GENIATagger [18] to segment every sentence into a sequence of tokens.…”
Section: Methodsmentioning
confidence: 99%
“…Firstly, we employ the GENIATagger [18] to segment every sentence into a sequence of tokens. Then, we run a sub-tokenization module used in our previous work [17] to further segment tokens into sub-tokens. We use the SOBIE tag-scheme which has nine labels including B-GPRO_TYPE_1, I-GPRO_TYPE_1, E-GPRO_TYPE_1, S-GPRO_TYPE_1, B-GPRO_TYPE_2, I-GPRO_TYPE_2, E-GPRO_TYPE_2, and S-GPRO_TYPE_2, and O.…”
Section: Methodsmentioning
confidence: 99%