2005
DOI: 10.1186/1471-2105-6-s1-s9
|View full text |Cite
|
Sign up to set email alerts
|

Systematic feature evaluation for gene name recognition

Abstract: In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre-o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2005
2005
2017
2017

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 25 publications
(9 citation statements)
references
References 15 publications
(17 reference statements)
0
9
0
Order By: Relevance
“…Transmembrane domains were predicted by TMHMM as described in reference 65. Graphic alignments were calculated by GATA using default settings (66). Amino acid sequences of SAgs were aligned with ClustalW, and the neighbor joining tree was calculated with MEGA4 (67) with 1,000 bootstrap replicates, based on p distances and after pairwise deletion of gaps.…”
Section: Methodsmentioning
confidence: 99%
“…Transmembrane domains were predicted by TMHMM as described in reference 65. Graphic alignments were calculated by GATA using default settings (66). Amino acid sequences of SAgs were aligned with ClustalW, and the neighbor joining tree was calculated with MEGA4 (67) with 1,000 bootstrap replicates, based on p distances and after pairwise deletion of gaps.…”
Section: Methodsmentioning
confidence: 99%
“…An extensive number of customized kernels (functions that quantify the similarity between sequences) have been proposed and are more effective than general-purpose kernels (e.g., polynomial and radial basis function kernels) [11]. Classifiers developed to explicitly optimize the area under ROC curve (AUC) have been shown to be more useful than to optimize the classification error when the AUC is the performance metric of interest [12], and ensemble learning methods (e.g., boosting and bagging) have been shown effective in improving the performance of a single learning algorithm [13]. Now, with the aid of bioinformatics and machine learning, we are able to narrow protein of our interest, predict epitope using computation, increase allergy diagnostic accuracy, and drastically reduce the number of wet lab experiments.…”
Section: Fig 1 a Schematic Diagram Representing The Ige-mediated Allmentioning
confidence: 99%
“…Essentially, we use a token-based SVM classifier with expansion to noun-phrases to identify names referring to proteins [5]. This tagger was trained on the BioCreAtIvE Task 1A data [1,15].…”
Section: Named Entity Recognitionmentioning
confidence: 99%