Comparative experiments on learning information extractors for proteins and their interactions

Bunescu, Răzvan; Ge, Ruifang; Kate, Rohit J.; Marcotte, Edward M.; Mooney, Raymond J.; Ramani, Arun; Wong, Yuk Wah

doi:10.1016/j.artmed.2004.07.016

Cited by 297 publications

(234 citation statements)

References 16 publications

Supporting

Mentioning

225

Contrasting

Unclassified

Order By: Relevance

“…Consistently, recent trends regarding the application of machine learning to biological IE head toward the development of public annotated corpora, targeting such binary relations to compare systems' performances (e.g. AIMed [29], Bioinfer [30], HPRD50 [10], LLL [9]). In this paper, the ontology does not limit us to the extraction of a single relation, but allows the definition of numerous relations.…”

Section: Resultsmentioning

confidence: 99%

Extraction of Genic Interactions with the Recursive Logical Theory of an Ontology

Manine

Alphonse

Bessières

2010

Computational Linguistics and Intelligent Text Processing

View full text Add to dashboard Cite

Abstract. We introduce an Information Extraction (IE) system which uses the logical theory of an ontology as a generalisation of the typical information extraction patterns to extract biological interactions from text. This provides inferences capabilities beyond current approaches: first, our system is able to handle multiple relations; second, it allows to handle dependencies between relations, by deriving new relations from the previously extracted ones, and using inference at a semantic level; third, it addresses recursive or mutually recursive rules. In this context, automatically acquiring the resources of an IE system becomes an ontology learning task: terms, synonyms, conceptual hierarchy, relational hierarchy, and the logical theory of the ontology have to be acquired. We focus on the last point, as learning the logical theory of an ontology, and a fortiori of a recursive one, remains a seldom studied problem. We validate our approach by using a relational learning algorithm, which handles recursion, to learn a recursive logical theory from a text corpus on the bacterium Bacillus subtilis. This theory achieves a good recall and precision for the ten defined semantic relations, reaching a global recall of 67.7% and a precision of 75.5%, but more importantly, it captures complex mutually recursive interactions which were implicitly encoded in the ontology.

show abstract

Section: Resultsmentioning

confidence: 99%

Extraction of Genic Interactions with the Recursive Logical Theory of an Ontology

Manine

Alphonse

Bessières

2010

Computational Linguistics and Intelligent Text Processing

View full text Add to dashboard Cite

show abstract

“…These include the protein-interaction datasets from Ray and Craven (2001) and from Bunescu et al (2005), and we have reported elsewhere results on a recent Learning Language and Logic challenge task dataset (Goadrich et al, 2005). Other datasets outside of IE where we believe Gleaner will be useful include the nuclear smuggling dataset from Tang et al (2003), the social network dataset from Taskar et al (2003), the CiteSeer citation dataset from Popescul et al (2003), and the university relation dataset from Richardson and Domingos (2006).…”

Section: Discussionmentioning

confidence: 99%

“…Recently, biomedical journal articles have been a major source of interest in the IE community for a number of reasons: the amount of data available is enormous; the objects, proteins and genes, do not have standard naming conventions; and there is interest from biomedical practitioners to quickly find relevant information (Blaschke et al, 2002;Shatkay and Feldman, 2003;Ray and Craven, 2001;Bunescu et al, 2005). We have focused on learning multi-slot protein localization from Medline 1 abstracts, where the task is to identify links between phrases which correspond to a protein and the location of that particular protein in a cell.…”

Section: Information Extractionmentioning

confidence: 99%

“…His results are limited to a small dataset, and recall-precision results are only given for one point as opposed to our analysis using a curve. Bunescu et al (2005) propose the use of Extraction using Longest Common Subsequence (ELCS), a bottom-up approach to finding protein interactions with rule templates for sentences. They use a greedy covering algorithm to repeatedly generalize sentence templates until enough templates are found to cover most positive examples.…”

Section: Information Extractionmentioning

confidence: 99%

See 1 more Smart Citation

Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves

2006

View full text Add to dashboard Cite

Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. The goal of our research is to find new approaches within ILP particularly suited for large, highly-skewed domains. We propose Gleaner, a randomized search method that collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an "at least L of these K clauses" thresholding method to combine sets of selected clauses. Our research focuses on Multi-Slot Information Extraction (IE), a task that typically involves many more negative examples than positive examples. We formulate this problem into a relational domain, using two large testbeds involving the extraction of important relations from the abstracts of biomedical journal articles. We compare Gleaner to ensembles of standard theories learned by Aleph, finding that Gleaner produces comparable testset results in a fraction of the training time.

show abstract

“…This is why along ROC curves analysis we validate our binary classifier with the help of PrecisionRecall (PR) curves. PR curves have been mentioned as an alternative to ROC curves for tasks with a large skew in the class distribution (Craven, 2005;Bunescu et al, 2005;Goadrich et al, 2004). Indeed, when the proportion of negative samples is much greater than that of the positive ones, a large change in the fraction of false positives can lead to a minimal change in the false positive rate of the ROC analysis, because they are underrepresented in the test set.…”

Section: Assessing the Classifier Performancementioning

confidence: 99%

Decision Making from Confidence Measurement on the Reward Growth using Supervised Learning - A Study Intended for Large-scale Video Games

Taralla

Qiu

Sutera

et al. 2016

Proceedings of the 8th International Conference on Agents and Artificial Intelligence

View full text Add to dashboard Cite

Abstract:Video games have become more and more complex over the past decades. Today, players wander in visuallyand option-rich environments, and each choice they make, at any given time, can have a combinatorial number of consequences. However, modern artificial intelligence is still usually hard-coded, and as the game environments become increasingly complex, this hard-coding becomes exponentially difficult. Recent research works started to let video game autonomous agents learn instead of being taught, which makes them more intelligent. This contribution falls under this very perspective, as it aims to develop a framework for the generic design of autonomous agents for large-scale video games. We consider a class of games for which expert knowledge is available to define a state quality function that gives how close an agent is from its objective. The decision making policy is based on a confidence measurement on the growth of the state quality function, computed by a supervised learning classification model. Additionally, no stratagems aiming to reduce the action space are used. As a proof of concept, we tested this simple approach on the collectible card game Hearthstone and obtained encouraging results.

show abstract

Comparative experiments on learning information extractors for proteins and their interactions

Cited by 297 publications

References 16 publications

Extraction of Genic Interactions with the Recursive Logical Theory of an Ontology

Extraction of Genic Interactions with the Recursive Logical Theory of an Ontology

Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves

Decision Making from Confidence Measurement on the Reward Growth using Supervised Learning - A Study Intended for Large-scale Video Games

Contact Info

Product

Resources

About