The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2001
DOI: 10.1093/bioinformatics/17.suppl_1.s97
|View full text |Cite
|
Sign up to set email alerts
|

Disambiguating proteins, genes, and RNA in text: a machine learning approach

Abstract: We present an automated system for assigning protein, gene, or mRNA class labels to biological terms in free text. Three machine learning algorithms and several extended ways for defining contextual features for disambiguation are examined, and a fully unsupervised manner for obtaining training examples is proposed. We train and evaluate our system over a collection of 9 million words of molecular biology journal articles, obtaining accuracy rates up to 85%.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
122
0
1

Year Published

2004
2004
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 162 publications
(124 citation statements)
references
References 15 publications
(19 reference statements)
1
122
0
1
Order By: Relevance
“…Hatzivassiloglou and Duboué [19] used three supervised learning techniques, C4.5 decision trees, naïve Bayes, and inductive learning. They tested different features with an automatically created gold standard to distinguish between genes, proteins, and mRNA.…”
Section: Information Sourcesmentioning
confidence: 99%
See 2 more Smart Citations
“…Hatzivassiloglou and Duboué [19] used three supervised learning techniques, C4.5 decision trees, naïve Bayes, and inductive learning. They tested different features with an automatically created gold standard to distinguish between genes, proteins, and mRNA.…”
Section: Information Sourcesmentioning
confidence: 99%
“…Alas, compiling such gold standards is time-consuming and difficult. Some researchers have built gold standards automatically [16,19,21] to sidestep the difficulty of finding experts to create them. These standards are an excellent approach to comparing different algorithms.…”
Section: Research Questionmentioning
confidence: 99%
See 1 more Smart Citation
“…A promising approach to handle the resulting information overload is to automate the process of knowledge extraction using data mining techniques, thereby extracting novel information and relationships between biological features (Fielding 1999;Hatzivassiloglou et al 2001). Machine learning techniques permit the building of models for a given classiWcation task.…”
Section: Introductionmentioning
confidence: 99%
“…Although this is often probably true, it is not guaranteed and may lead to misinterpretations. In a test, experts agreed in only 78% of cases if a sentence was about DNA, mRNA, or protein [11]. This kind of distinction should be crystal clear.…”
Section: Why Bother?mentioning
confidence: 99%