Jun’ichi Tsujii scite author profile

The paper presents the design and implementation of the BioNLP'09 Shared Task, and reports the final results with analysis. The shared task consists of three sub-tasks, each of which addresses bio-molecular event extraction at a different level of specificity. The data was developed based on the GENIA event corpus. The shared task was run over 12 weeks, drawing initial interest from 42 teams. Of these teams, 24 submitted final results. The evaluation results are encouraging, indicating that state-of-the-art performance is approaching a practically applicable level and revealing some remaining challenges.

show abstract

Corpus annotation for mining biomedical events from literature

Kim

2008

View full text Add to dashboard Cite

Background: Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation.

show abstract

Developing a Robust Part-of-Speech Tagger for Biomedical Text

et al. 2005

View full text Add to dashboard Cite

Probabilistic CFG with latent annotations

2005

View full text Add to dashboard Cite

This paper defines a generative probabilistic model of parse trees, which we call PCFG-LA. This model is an extension of PCFG in which non-terminal symbols are augmented with latent variables. Finegrained CFG rules are automatically induced from a parsed corpus by training a PCFG-LA model using an EM-algorithm. Because exact parsing with a PCFG-LA is NP-hard, several approximations are described and empirically compared. In experiments using the Penn WSJ corpus, our automatically trained model gave a performance of 86.6% (F¥ , sentences ¦ 40 words), which is comparable to that of an unlexicalized PCFG parser created using extensive manual feature selection.

show abstract

Tuning support vector machines for biomedical named entity recognition

Kazama¹,

Makino²,

Ohta³

et al. 2002

181

142

View full text Add to dashboard Cite

We explore the use of Support Vector Machines (SVMs) for biomedical named entity recognition. To make the SVM training with the available largest corpus -the GENIA corpus -tractable, we propose to split the non-entity class into sub-classes, using part-of-speech information. In addition, we explore new features such as word cache and the states of an HMM trained by unsupervised learning. Experiments on the GENIA corpus show that our class splitting technique not only enables the training with the GENIA corpus but also improves the accuracy. The proposed new features also contribute to improve the accuracy. We compare our SVMbased recognition system with a system using Maximum Entropy tagging method.

show abstract

Event extraction for systems biology by text mining the literature

Ananiadou

Pyysalo

Tsujii

et al. 2010

Trends in Biotechnology

177

131

View full text Add to dashboard Cite

Extracting the names of genes and gene products with a hidden Markov model

2000

View full text Add to dashboard Cite

\~e report the results of a study into the use of a linear interpolating hidden Marker model (HMM) for the task of extra.('ting lxw]mi(:al |;erminology fl:om MEDLINE al)stra('ts and texl;s in the molecular-bioh)gy domain. Tiffs is the first stage isl a. system that will exl;ra('l; evenl; information for automatically ut)da.ting 1)ioh)gy databases. We trained the HMM entirely with 1)igrams based (m lexical and character features in a relatively small corpus of 100 MED-LINE abstract;s that were ma.rked-ul) l)y (lomain experts wil;h term (:lasses su(:h as t)rol;eins and DNA. I.Jsing cross-validation methods we a(:]fieved a,n ].e-score of 0.73 and we (',xmnine the ('ontrilmtion made by each 1)art of the interl)olation model to overconfing (la.ta Sl)arsen('.ss.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jun’ichi Tsujii

GENIA corpus—a semantically annotated corpus for bio-textmining

Overview of BioNLP'09 shared task on event extraction

Corpus annotation for mining biomedical events from literature

Developing a Robust Part-of-Speech Tagger for Biomedical Text

Probabilistic CFG with latent annotations

Tuning support vector machines for biomedical named entity recognition

Event extraction for systems biology by text mining the literature

Extracting the names of genes and gene products with a hidden Markov model

Contact Info

Product

Resources

About