Proceedings of the 18th Conference on Computational Linguistics - 2000
DOI: 10.3115/990820.990850
|View full text |Cite
|
Sign up to set email alerts
|

Extracting the names of genes and gene products with a hidden Markov model

Abstract: \~e report the results of a study into the use of a linear interpolating hidden Marker model (HMM) for the task of extra.('ting lxw]mi(:al |;erminology fl:om MEDLINE al)stra('ts and texl;s in the molecular-bioh)gy domain. Tiffs is the first stage isl a. system that will exl;ra('l; evenl; information for automatically ut)da.ting 1)ioh)gy databases. We trained the HMM entirely with 1)igrams based (m lexical and character features in a relatively small corpus of 100 MED-LINE abstract;s that were ma.rked-ul) l)y (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
131
0

Year Published

2002
2002
2020
2020

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 176 publications
(132 citation statements)
references
References 9 publications
1
131
0
Order By: Relevance
“…The rst step is the recognition of the protein names themselves (see e.g. [3,6,15]). As the focus of this paper is on the mining of interactions, we assume that protein name recognition has already taken place.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The rst step is the recognition of the protein names themselves (see e.g. [3,6,15]). As the focus of this paper is on the mining of interactions, we assume that protein name recognition has already taken place.…”
Section: Related Workmentioning
confidence: 99%
“…More and more relevant information is becoming available on the web, in particular in literature databases such as MEDLINE 3 , in ontological resources such as the Gene Ontology 4 , and in specialized structured databases such as IntAct 5 . The unstructured information in scientic publications poses the biggest challenge to biologists who are interested in specic gene or protein interactions, as they are forced to spend a tremendous amount of time reviewing articles looking for the information they need.…”
Section: Introductionmentioning
confidence: 99%
“…In Fukuda et al [1998], protein names are identified in biological papers using hand-coded rules. On the other hand, in Collier et al [2000], supervised learning methods based on Hidden Markov Models are used. Subramaniam et al [2003] have developed the BioAnnotator system, which is part of the current Relation Extraction system, and uses rules and dictionary lookup for identifying and classifying biological terms.…”
Section: Knowledge Extraction From Large Text Collections Using Searcmentioning
confidence: 99%
“…Hidden Markov Models (HMMs) [64] can learn a lexicon and context as well by computing the probability that a sequence of specific words surround or constitute a molecule name. The expert just has to identify examples, while the HMM learns the patterns to apply to new sequences of words.…”
Section: Named Entity Taggingmentioning
confidence: 99%