Mining Medline: Abstracts, Sentences, or Phrases?

Ding, Jing; Berleant, Daniel; Nettleton, Dan; Wurtele, Eve Syrkin

doi:10.1142/9789812799623_0031

Cited by 142 publications

(116 citation statements)

References 17 publications

Supporting

Mentioning

112

Contrasting

Unclassified

Order By: Relevance

“…Many public databases and bioinformatics tools have been developed and are currently available for use (Ding & Berleant, 2002). The primary goal of bioinformaticians is to develop reliable databases and effective analysis tools capable of handling bulk amount of biological data.…”

Section: Bioinformatics Workflow and Platform Designmentioning

confidence: 99%

In Silico Analysis of Golgi Glycosyltransferases: A Case Study on the LARGE-Like Protein Family

Hwa¹,

Lin²,

Subramani³

2011

Computational Biology and Applied Bioinformatics

View full text Add to dashboard Cite

Section: Bioinformatics Workflow and Platform Designmentioning

confidence: 99%

In Silico Analysis of Golgi Glycosyltransferases: A Case Study on the LARGE-Like Protein Family

Hwa¹,

Lin²,

Subramani³

2011

Computational Biology and Applied Bioinformatics

View full text Add to dashboard Cite

“…In the latter case, if the two co-occurring words/ phrases are physically positioned very far apart, co-occurrence may have no meaning. A recent study quantifies some of the precision-recall tradeoffs for different units, ranging from phrases to Abstracts [33].…”

Section: Semantic Boundariesmentioning

confidence: 99%

Science and Technology Text Mining: Electric Power Sources

Kostoff¹,

Tshiteya²,

Pfeil³

et al. 2004

View full text Add to dashboard Cite

Database Tomography (DT) is a textual database analysis system consisting of two major components: 1) algorithms for extracting multi-word phrase frequencies and phrase proximities (physical closeness of the multi-word technical phrases) from any type of large textual database, to augment 2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a Power Sources database derived from the Science Citation Index (SCI). Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the Power Sources database, and the phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the Power Sources literature supplemented the DT results with author/ journal/ institution/ country publication and citation data. Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. KEYWORDS:

show abstract

“…For protein name tagging, accuracies as high as around 95% have been reported [67], but care should be given to the test set composition. It is known that for some organisms or some protein subdomains, the nomenclature is fairly rigidly standardized and excellent tagging accuracy can be reached there.…”

Section: Named Entity Taggingmentioning

confidence: 99%

“…The effect of accidental co-occurrence could be minimized by requiring frequent corroboration of any pairing. Using a similar co-occurrence approach, Ding et al [67] found that precision and recall traded off when the length of the used text segment was varied. Working with phrases gave generally better precision, while working with entire abstracts gave best recall; sentences scored in between.…”

Section: Fact Extractionmentioning

confidence: 99%

Getting to the (c)ore of knowledge: mining biomedical literature

Bruijn

Martin

2002

International Journal of Medical Informatics

104

View full text Add to dashboard Cite

Literature mining is the process of extracting and combining facts from scientific publications. In recent years, many computer programs have been designed to extract various molecular biology findings from Medline abstracts or fulltext articles. The present article describes the range of text mining techniques that have been applied to scientific documents. It divides 'automated reading' into four general subtasks: text categorization, named entity tagging, fact extraction, and collection-wide analysis. Literature mining offers powerful methods to support knowledge discovery and the construction of topic maps and ontologies. An overview is given of recent developments in medical language processing. Special attention is given to the domain particularities of molecular biology, and the emerging synergy between literature mining and molecular databases accessible through Internet. Crown

show abstract

Mining Medline: Abstracts, Sentences, or Phrases?

Cited by 142 publications

References 17 publications

In Silico Analysis of Golgi Glycosyltransferases: A Case Study on the LARGE-Like Protein Family

In Silico Analysis of Golgi Glycosyltransferases: A Case Study on the LARGE-Like Protein Family

Science and Technology Text Mining: Electric Power Sources

Getting to the (c)ore of knowledge: mining biomedical literature

Contact Info

Product

Resources

About