CharaParser+EQ: Performance evaluation without gold standard

Cui, Hong; Dahdul, Wasila; Dececchi, Alexander T.; Ibrahim, Nizar; Mabee, Paula M.; Balhoff, James P.; Gopalakrishnan, Hariharan

doi:10.1002/pra2.2015.145052010020

Cited by 16 publications

(26 citation statements)

References 26 publications

(36 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As noted in the Introduction, annotation to EA requires considerably less human curation effort than EQ, and is almost identical in effort to curation to E. Restricting annotation granularity to EA may also ease the challenge of speeding of curation through machine-aided natural language processing, e.g. [13].…”

Section: E Adjusting Annotation Granularitymentioning

confidence: 99%

Measuring the importance of annotation granularity to the detection of semantic similarity between phenotype profiles

Manda

Balhoff

2016

Preprint

View full text Add to dashboard Cite

In phenotype annotations curated from the biological and medical literature, considerable human effort must be invested to select ontological classes that capture the expressivity of the original natural language descriptions, and finer annotation granularity can also entail higher computational costs for particular reasoning tasks. Do coarse annotations suffice for certain applications? Here, we measure how annotation granularity affects the statistical behavior of semantic similarity metrics. We use a randomized dataset of phenotype profiles drawn from 57,051 taxon-phenotype annotations in the Phenoscape Knowledgebase. We compared query profiles having variable proportions of matching phenotypes to subject database profiles using both pairwise and groupwise Jaccard (edge-based) and Resnik (node-based) semantic similarity metrics, and compared statistical performance for three different levels of annotation granularity: entities alone, entities plus attributes, and entities plus qualities (with implicit attributes). All four metrics examined showed more extreme values than expected by chance when approximately half the annotations matched between the query and subject profiles, with a more sudden decline for pairwise statistics and a more gradual one for the groupwise statistics. Annotation granularity had a negligible effect on the position of the threshold at which matches could be discriminated from noise. These results suggest that coarse annotations of phenotypes, at the level of entities with or without attributes, may be sufficient to identify phenotype profiles with statistically significant semantic similarity.

show abstract

Section: E Adjusting Annotation Granularitymentioning

confidence: 99%

Measuring the importance of annotation granularity to the detection of semantic similarity between phenotype profiles

Manda

Balhoff

2016

Preprint

View full text Add to dashboard Cite

show abstract

“…The large majority of ontology driven NER techniques rely on lexical and syntactic analysis of text in addition to machine learning for recognizing and tagging ontology concepts [3,4,6]. In recent years, deep learning has been introduced for NER of biological entities from literature [7,8,9,10,11].…”

Section: Introductionmentioning

confidence: 99%

“…In the context of ontology-based annotation, NER can be described as recognizing ontology concepts from text [5]. Outside the scope of ontology-based annotation, NER has been applied to biomedical and biological literature for recognizing genes, proteins, diseases, etc [5].The large majority of ontology driven NER techniques rely on lexical and syntactic analysis of text in addition to machine learning for recognizing and tagging ontology concepts [3,4,6]. In recent years, deep learning has been introduced for NER of biological entities from literature [7,8,9,10,11].…”

mentioning

confidence: 99%

Taking a Dive: Experiments in Deep Learning for Automatic Ontology-based Annotation of Scientific Literature

Manda

Beasley

Mohanty

2018

Preprint

View full text Add to dashboard Cite

Text mining approaches for automated ontology-based curation of biological and biomedical literature have largely focused on syntactic and lexical analysis along with machine learning. Recent advances in deep learning have shown increased accuracy for textual data annotation. However, the application of deep learning for ontology-based curation is a relatively new area and prior work has focused on a limited set of models.Here, we introduce a new deep learning model/architecture based on combining multiple Gated Recurrent Units (GRU) with a character+word based input. We use data from five ontologies in the CRAFT corpus as a Gold Standard to evaluate our model's performance. We also compare our model to seven models from prior work. We use four metrics -Precision, Recall, F1 score, and a semantic similarity metric (Jaccard similarity) to compare our model's output to the Gold Standard. Our model resulted in a 84% Precision, 84% Recall, 83% F1, and a 84% Jaccard similarity. Results show that our GRU-based model outperforms prior models across all five ontologies. We also observed that character+word inputs result in a higher performance across models as compared to word only inputs.These findings indicate that deep learning algorithms are a promising avenue to be explored for automated ontology-based curation of data. This study also serves as a formal comparison and guideline for building and selecting deep learning models and architectures for ontology-based curation. II. INTRODUCTIONOntology-based data representation has been widely adopted in data intensive fields such as biology and biomedicine due to the need for large scale computationally amenable data [1]. However, the majority of ontology-based data generation relies on manual literature curation -a slow and tedious process [2]. Natural language and text mining methods have been developed as the solution for scalable ontology-based data curation [3,4].One of the most important tasks for annotating scientific literature with ontology concepts is Named Entity Recognition p manda@uncg.edu sdmohant@uncg.edu (NER). In the context of ontology-based annotation, NER can be described as recognizing ontology concepts from text [5]. Outside the scope of ontology-based annotation, NER has been applied to biomedical and biological literature for recognizing genes, proteins, diseases, etc [5].The large majority of ontology driven NER techniques rely on lexical and syntactic analysis of text in addition to machine learning for recognizing and tagging ontology concepts [3,4,6]. In recent years, deep learning has been introduced for NER of biological entities from literature [7,8,9,10,11]. However, the majority of prior work has focused on a limited set of models, particularly, the Long Short-Term Memory (LSTM) model (e.g. [7]).Here, we present a new deep learning architecture that utilizes Gated Recurring Units (GRU) while taking advantage of word and character encodings from the annotation training data to recognize ontology concepts from text. We evaluate our model in...

show abstract

“…NER is an important component of information extraction and annotation for a wide range of domains such as biomedical research, biology, etc [6]. In other applications, NER is one of the crucial preliminary steps for subsequent creation of complex ontologybased expressions [7]. For example, the Entity Quality (EQ) p manda@uncg.edu annotation format is widely used to describe clinical and evolutionary phenotypes [8], [9].…”

Section: Introductionmentioning

confidence: 99%

Comparison of natural language processing tools for automatic gene ontology annotation of scientific literature

Beasley¹,

Manda²

2018

Preprint

View full text Add to dashboard Cite

Abstract-Manual curation of scientific literature for ontologybased knowledge representation has proven infeasible and unscalable to the large and growing volume of scientific literature. Automated annotation solutions that leverage text mining and Natural Language Processing (NLP) have been developed to ameliorate the problem of literature curation. These NLP approaches use parsing, syntactical, and lexical analysis of text to recognize and annotate pieces of text with ontology concepts. Here, we conduct a comparison of four state of the art NLP tools at the task of recognizing Gene Ontology concepts from biomedical literature using the Colorado Richly Annotated Full-Text (CRAFT) corpus as a gold standard reference. We demonstrate the use of semantic similarity metrics to compare NLP tool annotations to the gold standard.

show abstract

CharaParser+EQ: Performance evaluation without gold standard

Cited by 16 publications

References 26 publications

Measuring the importance of annotation granularity to the detection of semantic similarity between phenotype profiles

Measuring the importance of annotation granularity to the detection of semantic similarity between phenotype profiles

Taking a Dive: Experiments in Deep Learning for Automatic Ontology-based Annotation of Scientific Literature

Comparison of natural language processing tools for automatic gene ontology annotation of scientific literature

Contact Info

Product

Resources

About