2017
DOI: 10.12688/f1000research.11389.1
|View full text |Cite
|
Sign up to set email alerts
|

PubRunner: A light-weight framework for updating text mining results

Abstract: Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are underused because their results are static and do not reflect the constantly expanding knowledge in the field. In ord… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
3
1
1

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…Our dataset is fairly sizeable; updating the dataset with only the most recent papers being added to the PMC collection was not initially addressed by our work. For this release we have tested the PubRunner ( Anekalla et al, 2017 ) in order to periodically process only the most recent entries to PMC. In order to make it easier for us to release updates of the dataset we are modifying PubRunner and adapting it to our case.…”
Section: Discussionmentioning
confidence: 99%
“…Our dataset is fairly sizeable; updating the dataset with only the most recent papers being added to the PMC collection was not initially addressed by our work. For this release we have tested the PubRunner ( Anekalla et al, 2017 ) in order to periodically process only the most recent entries to PMC. In order to make it easier for us to release updates of the dataset we are modifying PubRunner and adapting it to our case.…”
Section: Discussionmentioning
confidence: 99%
“…We then used the PubRunner infrastructure to apply these two classifiers across all the aligned sentences. 25 This enabled the use of a compute cluster to quickly classify sentences as to whether they contain pharmacogenomic information. We then outputted relations along with the normalized form of the chemical and genes and other metadata.…”
Section: Methodsmentioning
confidence: 99%
“…Our approach is compatible with other text mining frameworks, such as PubRunner 27 , for updating processed citations with the latest PubMed entries, and the many available text processing toolkits which can be used to process raw article metadata into processed feature sets, for example the NLTK (http://www.nltk.org/), the Stanford CoreNLP (https://stanfordnlp.github.io/CoreNLP/), and Apache Open NLP (http://opennlp.apache.org/). The approach is also amenable to implementation on large scale parallel processing data analytic systems, such as Apache Spark (https://spark.apache.org/), which includes parallel implementations of several machine learning algorithms including SVM 28,29 .…”
Section: Can This Framework Be Generalized To Other Biomedical Text Mmentioning
confidence: 96%