CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework

Tang, Zhuo; Jiang, Lingang; Yang, Li; Li, Kenli; Li, Keqin

doi:10.1007/s10586-015-0426-z

Cited by 24 publications

(7 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the first phase boundaries of entities are identified while in the second phase semantic labeling is performed to label the detected entities. A CRF based system has been proposed by (Tang et al, 2015), where in the first step boundaries of NEs are identified and in the second step appropriate labels are assigned. (Grouin, 2014) performed experiments on the i2b2/VA-2010 challenge dataset to detect bacteria and biotopes names.…”

Section: Related Workmentioning

confidence: 99%

Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection

Yadav¹,

Ekbal²,

Saha³

et al. 2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1

View full text Add to dashboard Cite

Text mining has drawn significant attention in recent past due to the rapid growth in biomedical and clinical records. Entity extraction is one of the fundamental components for biomedical text mining. In this paper, we propose a novel approach of feature selection for entity extraction that exploits the concept of deep learning and Particle Swarm Optimization (PSO). The system utilizes word embedding features along with several other features extracted by studying the properties of the datasets. We obtain an interesting observation that compact word embedding features as determined by PSO are more effective compared to the entire word embedding feature set for entity extraction. The proposed system is evaluated on three benchmark biomedical datasets such as GENIA, GENETAG and AiMed. The effectiveness of the proposed approach is evident with significant performance gains over the baseline models as well as the other existing systems. We observe improvements of 7.86%, 5.27% and 7.25% F-measure points over the baseline models for GE-NIA, GENETAG, and AiMed dataset respectively.

show abstract

Section: Related Workmentioning

confidence: 99%

Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection

Yadav¹,

Ekbal²,

Saha³

et al. 2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1

View full text Add to dashboard Cite

show abstract

“…For most deep neural networks-based NER methods, chain CRF [10] acts as the tag decoder. However, as an alternative, recurrent neural networks (RNNs) can be also used for decoding tags of sequences [11][12][13].…”

Section: Related Workmentioning

confidence: 99%

Adversarial Active Learning for Named Entity Recognition in Cybersecurity

Li¹,

Hu²,

Ju³

et al. 2020

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

Owing to the continuous barrage of cyber threats, there is a massive amount of cyber threat intelligence. However, a great deal of cyber threat intelligence come from textual sources. For analysis of cyber threat intelligence, many security analysts rely on cumbersome and time-consuming manual efforts. Cybersecurity knowledge graph plays a significant role in automatics analysis of cyber threat intelligence. As the foundation for constructing cybersecurity knowledge graph, named entity recognition (NER) is required for identifying critical threat-related elements from textual cyber threat intelligence. Recently, deep neural network-based models have attained very good results in NER. However, the performance of these models relies heavily on the amount of labeled data. Since labeled data in cybersecurity is scarce, in this paper, we propose an adversarial active learning framework to effectively select the informative samples for further annotation. In addition, leveraging the long short-term memory (LSTM) network and the bidirectional LSTM (BiLSTM) network, we propose a novel NER model by introducing a dynamic attention mechanism into the BiLSTM-LSTM encoderdecoder. With the selected informative samples annotated, the proposed NER model is retrained. As a result, the performance of the NER model is incrementally enhanced with low labeling cost. Experimental results show the effectiveness of the proposed method.

show abstract

“…It is worth mentioning that given the large amount of biomedical documents and texts that need to be processed by NER tools, several researchers have looked at optimizing the parallel capabilities of these tools. The work by Tang et al [ 53 ] and Li et al [ 54 ] are two notable recent work in this respect. These two works contend that given the sequential nature of CRF models, their parallelization is not trivial.…”

Section: Entity-specific Biomedical Annotation Toolsmentioning

confidence: 99%

Semantic annotation in biomedicine: the current landscape

Jovanović

Bagheri

2017

J Biomed Semant

View full text Add to dashboard Cite

The abundance and unstructured nature of biomedical texts, be it clinical or research content, impose significant challenges for the effective and efficient use of information and knowledge stored in such texts. Annotation of biomedical documents with machine intelligible semantics facilitates advanced, semantics-based text management, curation, indexing, and search. This paper focuses on annotation of biomedical entity mentions with concepts from relevant biomedical knowledge bases such as UMLS. As a result, the meaning of those mentions is unambiguously and explicitly defined, and thus made readily available for automated processing. This process is widely known as semantic annotation, and the tools that perform it are known as semantic annotators.Over the last dozen years, the biomedical research community has invested significant efforts in the development of biomedical semantic annotation technology. Aiming to establish grounds for further developments in this area, we review a selected set of state of the art biomedical semantic annotators, focusing particularly on general purpose annotators, that is, semantic annotation tools that can be customized to work with texts from any area of biomedicine. We also examine potential directions for further improvements of today’s annotators which could make them even more capable of meeting the needs of real-world applications. To motivate and encourage further developments in this area, along the suggested and/or related directions, we review existing and potential practical applications and benefits of semantic annotators.

show abstract

CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework

Cited by 24 publications

References 33 publications

Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection

Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection

Adversarial Active Learning for Named Entity Recognition in Cybersecurity

Semantic annotation in biomedicine: the current landscape

Contact Info

Product

Resources

About