Towards Large Scale Semantic Annotation Built on MapReduce Architecture

Laclavik, Michal; Seleng, Martin; Hluchý, Ladislav

doi:10.1007/978-3-540-69389-5_38

Cited by 21 publications

(25 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Now it is popular in text mining of various applications [18], especially Natural Language Processing (NLP) and Machine Learning (ML), as the MapReduce paradigm has emerged as a highly successful programing model for large-scale data-intensive computing applications [19]. Laclavik et al presented a pattern of annotation tool based on MapReduce architecture to process large amount of text data [20]. Lin and Dyer discussed the processing method of data intensive text based on MapReduce, such as parallelization of EM algorithm and HMM model [4].…”

Section: Related Workmentioning

confidence: 99%

CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework

et al. 2015

View full text Add to dashboard Cite

As the rapid growth of the biomedical literature, the model training time in biomedical named entity recognition increases sharply when dealing with large-scale training samples. How to increase the efficiency of named entity recognition in biomedical big data becomes one of the key problems in biomedical text mining. For the purposes of improving the recognition performance and reducing the training time, this paper proposes an optimization method for two-phase recognition using conditional random fields. In the first stage, each named entity boundary is detected to distinguish all real entities. In the second stage, we label the semantic class of the entity detected. To expedite the training speed, in these two phases, we implement the model training process on a parallel optimization program framework based on MapReduce. Through dividing the training set into several parts, the iterations in the training algorithm are designed as map tasks which can be executed simultaneously in a cluster, where each map function is designed to complete the calculation of a gradient vector component for each part in the training set. Our experiments show that the proposed method in this paper can achieve high performance with short training time, which has important implications for the current biological big data processing.

show abstract

Section: Related Workmentioning

confidence: 99%

CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework

et al. 2015

View full text Add to dashboard Cite

show abstract

“…They commented that UIMA and GATE would benefit from adopting MapReduce. Laclavik et al [16] demonstrated using Ontea [17] with Hadoop. This study presents GATECloud.net-the adaptation of the GATE infrastructure to the cloud, following the PaaS paradigm. It enables researchers to run their NLP applications without the significant overheads of re-implementing their algorithms for MapReduce and understanding Amazon's IaaS APIs.…”

Section: Large-scale Text Mining and Compute Cloudsmentioning

confidence: 99%

GATECloud.net: a platform for large-scale, open-source text processing on the cloud

Tablan

Roberts

Cunningham

et al. 2013

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research-GATECloud. net. It enables researchers to carry out dataintensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.

show abstract

“…Interested fellows leave traces in the digital space, sometimes even without being aware of it. For example: evaluations, recommendations, annotations, inscriptions on a virtual wall [45,46]. Interested fellows communicate with others, forming communities of those sharing interests.…”

Section: Cognitive Traveling In Digital Spacementioning

confidence: 99%

Cognitive traveling in digital space: from keyword search through exploratory information seeking

Návrat

2012

Open Computer Science

View full text Add to dashboard Cite

This paper surveys principal concepts involved in various approaches to web search. There are many attempts to improve key word search. There is the concept of exploratory search, which represents a shift towards more complex view of the interested fellow's role, widening her options. We propose a more radical shift towards viewing information seeking as cognitive traveling in the digital information space involving both web and digital libraries.

show abstract

Towards Large Scale Semantic Annotation Built on MapReduce Architecture

Cited by 21 publications

References 9 publications

CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework

CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework

GATECloud.net: a platform for large-scale, open-source text processing on the cloud

Cognitive traveling in digital space: from keyword search through exploratory information seeking

Contact Info

Product

Resources

About