Proceedings of the 2005 ACM Symposium on Applied Computing 2005
DOI: 10.1145/1066677.1066722
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing syntax patterns for discovering protein-protein interactions

Abstract: We propose a method for automated extraction of proteinprotein interactions from scientific text. Our system matches sentences against syntax patterns typically describing protein interactions. We define a set of 22 patterns, each a regular expression consisting of anchor positions and parameterizable constraints. This small set is then refined and optimized using a genetic algorithm on a training set. No heuristic definitions are necessary, and the final pattern set can be generated completely without manual … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2005
2005
2015
2015

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(22 citation statements)
references
References 15 publications
(19 reference statements)
0
22
0
Order By: Relevance
“…We construct a lexicon called iLexicon that consists of interaction nouns and verbs similar to the ones proposed by Plake et al (2005). Then we refine and extend iLexicon based on the training data of Data FEBS .…”
Section: Coreference Resolutionmentioning
confidence: 99%
See 1 more Smart Citation
“…We construct a lexicon called iLexicon that consists of interaction nouns and verbs similar to the ones proposed by Plake et al (2005). Then we refine and extend iLexicon based on the training data of Data FEBS .…”
Section: Coreference Resolutionmentioning
confidence: 99%
“…These gaps are limited in length but they do not require particular words. As recommend in Plake et al (2005) we have set the maximum length of the gaps equal to 5. …”
Section: Word Gapsmentioning
confidence: 99%
“…We also performed the classification task on unseen data set and compared the results to the output from conventional ML algorithms. Table 1 reports precision rates at top N th rank sentences for Biocreative I corpus (BC) [13], Christine Brun corpus (CB) [10] and Negative Cases of ProteinProtein interaction corpus (N-PPI) [14], respectively. The hypernetwork classifier outputs acceptable results compared to SVM with tree kernel, naive Bayes, and SVM with RBF kernels.…”
Section: Ppi Sentence Classificationmentioning
confidence: 99%
“…Co-occurrence analysis is the most straightforward approach and generally results in high recall but low precision [9,10]. Some other approaches construct patterns specifying how an interaction is described in literature and use them as rules to find PPIs [11][12][13][14][15][16]. Rule or pattern-based approaches can increase precision but significantly lower recall.…”
Section: Introductionmentioning
confidence: 99%