Comparison of natural language processing tools for automatic gene ontology annotation of scientific literature

Beasley, Lucas; Manda, Prashanti

doi:10.7287/peerj.preprints.27028

Cited by 5 publications

(10 citation statements)

References 21 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CNNs were also used for biomedical named entity recognition combined with n-gram character embeddings resulting in enhanced performance in a comparison with other deep learning models [25]. A comprehensive review of deep learning methods for named entity recognition can be found in [13] and a comparison of existing text mining tools in [3].…”

Section: Related Workmentioning

confidence: 99%

Automated ontology-based annotation of scientific literature using deep learning

Manda

SayedAhmed

Mohanty

2020

Proceedings of the International Workshop on Semantic Big Data

Self Cite

View full text Add to dashboard Cite

Representing scienti c knowledge using ontologies enables data integration, consistent machine-readable data representation, and allows for large-scale computational analyses. Text mining approaches that can automatically process and annotate scienti c literature with ontology concepts are necessary to keep up with the rapid pace of scienti c publishing. Here, we present deep learning models (Gated Recurrent Units (GRU) and Long Short Term Memory (LSTM)) combined with di erent input encoding formats for automated Named Entity Recognition (NER) of ontology concepts from text. e Colorado Richly Annotated Full Text (CRAFT) gold standard corpus was used to train and test our models. Precision, Recall, F-1, and Jaccard semantic similarity were used to evaluate the performance of the models. We found that GRU-based models outperform LSTM models across all evaluation metrics. Surprisingly, considering the top two probabilistic predictions of the model for each instance instead of the top one resulted in a substantial increase in accuracy. Inclusion of ontology semantics via subsumption reasoning yielded modest performance improvement.

show abstract

Section: Related Workmentioning

confidence: 99%

Automated ontology-based annotation of scientific literature using deep learning

Manda

SayedAhmed

Mohanty

2020

Proceedings of the International Workshop on Semantic Big Data

Self Cite

View full text Add to dashboard Cite

show abstract

“…To the best of our knowledge, there is no "gold standard" for the annotation rate. Various studies conducting semantic annotation in various domains, such as Bada, Vasilevsky, Haendel, and Hunter (2016)) and Beasley and Manda (2018)) in bioinformatics, Lévy, Tomeh, and Ma (2014))…”

Section: Step 3: Evaluate Annotation Resultsmentioning

confidence: 99%

“…To the best of our knowledge, there is no “gold standard” for the annotation rate. Various studies conducting semantic annotation in various domains, such as Bada, Vasilevsky, Haendel, and Hunter ()) and Beasley and Manda ()) in bioinformatics, Lévy, Tomeh, and Ma ()) and Fiorelli, Pazienza, and Stellato ()) in annotation tool development, Jadidinejad, Mahmoudi, and Meybodi ()) in document classification, and Ali et al () in knowledge engineering; try to maximize the number of words annotated but do not report any number for annotation rate. We report on our findings regarding the annotation rate in Section 5.4.…”

Section: A Methods For Text Coherence Measurementmentioning

confidence: 99%

Assessment of text coherence using an ontology‐based relatedness measurement method

Giray¹,

Ünalır

2019

Expert Systems

View full text Add to dashboard Cite

This paper proposes a novel method for assessing text coherence. Central to this approach is an ontology‐based representation of text, which captures the level of relatedness between consecutive sentences via ontologies. Our method encompasses annotating text using ontological concepts and assessing text coherence based on relatedness measurement among these concepts. The ontology‐based relatedness measurement method used in this study considers various types of relationships in ontologies and derived relationships via an inference engine for computing relatedness. We hypothesized that rich variety of relationships and inferred facts in ontologies would improve the success of text coherence assessment. Our results demonstrate that the use of ontologies yields to coherence values that have a higher correlation with human ratings.

show abstract

“…The large majority of text mining approaches for recognizing ontology concepts from text either rely on lexical and syntactic analysis of text in addition to machine learning (Cui et al, ; Jonquet et al, ; Manda, Beasley, & Mohanty, ; Mungall et al, ). Beasley and Manda () recently conducted a comparison of a number of text mining tools at annotating biological literature with GO terms.…”

Section: Text Miningmentioning

confidence: 99%

Data mining powered by the gene ontology

Manda

2020

WIREs Data Min & Knowl

Self Cite

View full text Add to dashboard Cite

The gene ontology (GO) is a widely used resource for describing molecular functions, biological processes, and cellular components of gene products.Since its inception in 2006, the GO has been used to describe millions of gene products resulting in a massive data store of over 6 million annotations. The staggering amount of data that has resulted from annotating gene products with GO terms has led the way and opened new avenues for a wide variety of large-scale computational analyses. Specifically, a variety of data mining techniques such as association rule mining, clustering etc. have been applied successfully to a range of biological applications.

show abstract

Comparison of natural language processing tools for automatic gene ontology annotation of scientific literature

Cited by 5 publications

References 21 publications

Automated ontology-based annotation of scientific literature using deep learning

Automated ontology-based annotation of scientific literature using deep learning

Assessment of text coherence using an ontology‐based relatedness measurement method

Data mining powered by the gene ontology

Contact Info

Product

Resources

About