Hadoop Recognition of Biomedical Named Entity Using Conditional Random Fields

Li, Kenli; Ai, Wei; Tang, Zhuo; Zhang, Fan; Jiang, Lingang; Li, Keqin; Hwang, Kai

doi:10.1109/tpds.2014.2368568

Cited by 63 publications

(30 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, majority of the systems submitted to the JNLPBA challenge made use of machine learning algorithms which have been observed to significantly outperform the dictionary based methods. Some of the recent works in BNER includes the unsupervised model as proposed in (Zhang and Elhadad, 2013), and the system based on CRF (Li et al, 2015a). A two-phase approach based on semi-Markov CRF is proposed in (Yang and Zhou, 2014).…”

Section: Related Workmentioning

confidence: 99%

Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection

Yadav¹,

Ekbal²,

Saha³

et al. 2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1

View full text Add to dashboard Cite

Text mining has drawn significant attention in recent past due to the rapid growth in biomedical and clinical records. Entity extraction is one of the fundamental components for biomedical text mining. In this paper, we propose a novel approach of feature selection for entity extraction that exploits the concept of deep learning and Particle Swarm Optimization (PSO). The system utilizes word embedding features along with several other features extracted by studying the properties of the datasets. We obtain an interesting observation that compact word embedding features as determined by PSO are more effective compared to the entire word embedding feature set for entity extraction. The proposed system is evaluated on three benchmark biomedical datasets such as GENIA, GENETAG and AiMed. The effectiveness of the proposed approach is evident with significant performance gains over the baseline models as well as the other existing systems. We observe improvements of 7.86%, 5.27% and 7.25% F-measure points over the baseline models for GE-NIA, GENETAG, and AiMed dataset respectively.

show abstract

Section: Related Workmentioning

confidence: 99%

Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection

Yadav¹,

Ekbal²,

Saha³

et al. 2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1

View full text Add to dashboard Cite

show abstract

“…To attack the unsymmetrical co-occurrence problem of PMI, EPMI was proposed and defined to extract prototypical words based on extended mutual information (EMI) and PMI 2 [37]. To generate the co-occurrence vector v for the word wi, the co-occurrence relation between the word wi and every word wj from the dataset was determined using EPMI, which is derived from extended mutual information EMI and PMI , as per Equations (4) and (5):…”

Section: Extended Distributed Prototypical Methodsmentioning

confidence: 99%

A Comparative Study of Word Representation Methods With Conditional Random Fields and Maximum Entropy Markov for Bio-Named Entity Recognition

Tareq

Mohd

2018

MJCS

View full text Add to dashboard Cite

Bio-Named Entity Recognition (Bio-NER) is the process of identifying and semantically classifying biomedical technical terms and named entities in Biomedicine literature. Therefore, it is a major task in biomedical knowledge acquisition. Meanwhile, Natural Language Processing (NLP) plays an important role in Bio-NER in the biomedical domain. The first and most essential biomedical literature mining task incorporates biomedical entity recognition such as protein, gene, and chemicals. The most recent Bio-NER methods rely on predefined traditional features, which attempt to capture the specific surface properties of entity types. However, these empirically predefined feature sets differ between entity types and are manually constructed and complicated, which means developing them is costly. In this paper, we systematically present a comparative evaluation study of three methods, which are: the traditional feature representation method, the continuous bag-of-words (CBOW) model, and a new prototypical representation method with two popular sequence-labeling approaches (Conditional Random Fields (CRFs) and Maximum Entropy Markov Models (MEMM)). We evaluated these models with two major Bio-NER tasks, which involve the JNLPBA and GENETAG corpora. This paper examined the prototypical word representation method and found that Word2Vec can be successfully used for Bio-NER. Our results show that the new prototypical representation method improved the performance of the two machine learning models with different datasets. Also, the new prototypical representation method performed better than the traditional feature representation method and CBOW model for both datasets. Finally, our experiment proved that the CRF classifier with the new prototypical representation method achieved the best results when 90% data was used as training data, yielding overall F-measure values of 0.79% and 0.85% for the JNLPBA corpus and GENETAG corpus, respectively. In comparison, the results achieved using the ME classifier yielded overall F-measure values of 0.76% and 0.78% for the JNLPBA corpus and GENETAG corpus, respectively.

show abstract

“…The CRF model is a discriminant probability, undirected graph learning model proposed by Lafferty [8] based on the maximum entropy model [54] and hidden Markov model [55]. CRF was first proposed for sequence data analysis and has been successfully applied in the fields of natural language processing (NLP), bioinformatics, machine vision, and network intelligence [56][57][58][59].…”

Section: Crfmentioning

confidence: 99%

Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards

Fan

Wang

Yan

et al. 2019

IJGI

View full text Add to dashboard Cite

Constructing a knowledge graph of geological hazards literature can facilitate the reuse of geological hazards literature and provide a reference for geological hazard governance. Named entity recognition (NER), as a core technology for constructing a geological hazard knowledge graph, has to face the challenges that named entities in geological hazard literature are diverse in form, ambiguous in semantics, and uncertain in context. This can introduce difficulties in designing practical features during the NER classification. To address the above problem, this paper proposes a deep learning-based NER model; namely, the deep, multi-branch BiGRU-CRF model, which combines a multi-branch bidirectional gated recurrent unit (BiGRU) layer and a conditional random field (CRF) model. In an end-to-end and supervised process, the proposed model automatically learns and transforms features by a multi-branch bidirectional GRU layer and enhances the output with a CRF layer. Besides the deep, multi-branch BiGRU-CRF model, we also proposed a pattern-based corpus construction method to construct the corpus needed for the deep, multi-branch BiGRU-CRF model. Experimental results indicated the proposed deep, multi-branch BiGRU-CRF model outperformed state-of-the-art models. The proposed deep, multi-branch BiGRU-CRF model constructed a large-scale geological hazard literature knowledge graph containing 34,457 entities nodes and 84,561 relations.

show abstract

Hadoop Recognition of Biomedical Named Entity Using Conditional Random Fields

Cited by 63 publications

References 36 publications

Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection

Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection

A Comparative Study of Word Representation Methods With Conditional Random Fields and Maximum Entropy Markov for Bio-Named Entity Recognition

Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards

Contact Info

Product

Resources

About