2018
DOI: 10.1093/database/bay098
|View full text |Cite
|
Sign up to set email alerts
|

Large-scale automated machine reading discovers new cancer-driving mechanisms

Abstract: PubMed, a repository and search engine for biomedical literature, now indexes >1 million articles each year. This exceeds the processing capacity of human domain experts, limiting our ability to truly understand many diseases. We present Reach, a system for automated, large-scale machine reading of biomedical papers that can extract mechanistic descriptions of biological processes with relatively high precision at high throughput. We demonstrate that combining the extracted pathway fragments with existing biol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
95
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 62 publications
(101 citation statements)
references
References 33 publications
(38 reference statements)
2
95
0
Order By: Relevance
“…Another major barrier to pathway data access is that only a small handful of pathway and molecular interaction resources that curate data from the literature remain actively funded and they are only able to cover a relatively small part of the rapidly growing literature. To address this, the PC team is advancing text-mining technology to extract pathway information directly from the existing literature (60,66,67) , and developing a curation support tool that empowers authors themselves to capture and share structured summaries of knowledge described in their articles. These efforts, when combined with continued expert curation, may meet the challenge of providing high-quality, computable pathway information that can be effectively searched and analyzed by the broader research community.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Another major barrier to pathway data access is that only a small handful of pathway and molecular interaction resources that curate data from the literature remain actively funded and they are only able to cover a relatively small part of the rapidly growing literature. To address this, the PC team is advancing text-mining technology to extract pathway information directly from the existing literature (60,66,67) , and developing a curation support tool that empowers authors themselves to capture and share structured summaries of knowledge described in their articles. These efforts, when combined with continued expert curation, may meet the challenge of providing high-quality, computable pathway information that can be effectively searched and analyzed by the broader research community.…”
Section: Resultsmentioning
confidence: 99%
“…PC data has also been used as prior information to predict cellular response based on data collected in systematic perturbation experiments (58) . Several tools and algorithms developed within DARPA's Big Mechanism program extensively use PC to evaluate fragments extracted from the literature using machine reading (35,59,60) .…”
Section: Analytical Tools Using the Pathway Commons Data Sourcementioning
confidence: 99%
“…Databases included PhosphoSitePlus [5], SIGNOR [6], HPRD [10], NCI-PID [11], Reactome [7], and the BEL Large Corpus (http://www.openbel.org). Text mining was performed using multiple systems having complementary strengths, including REACH [9], Sparser [12], and RLIMS-P [13]. REACH and Sparser were run on a text corpus that included both abstracts and full-text articles; RLIMS-P results were obtained from the iTextMine service [14] (Methods).…”
Section: Pathway Databases and Literature Contain Annotations Of Humamentioning
confidence: 99%
“…Such information is currently available from databases such as PhosphoSitePlus [5], SIGNOR [6] and Reactome [7]. These databases were assembled by manual curation but automated text mining has also been used to extract information on PTMs from the literature [8,9]. Ideally, functional analysis would involve the use of information aggregated from as many of these databases and text mining tools as possible.…”
Section: Introductionmentioning
confidence: 99%
“…We used the Integrated Network and Dynamical Reasoning Assembler (INDRA) system 15 to collect and assemble a set of statements from the scientific literature and pathway databases. INDRA integrates content from i) multiple natural language processing systems (REACH 38 and Sparser 39 ) of primary literature in the minable NCBI corpus and ii) queries on pathway databases (Pathway Commons 14,21 , BEL Large Corpus 40 , SIGNOR 41 ). INDRA extracts information about molecular mechanisms from these sources in a common statement representation, which has a rich functional semantic with respect to reactant and reaction types.…”
Section: Assembly Of Mechanistic Network Using Indramentioning
confidence: 99%