2009
DOI: 10.1504/ijcibsb.2009.024052
|View full text |Cite
|
Sign up to set email alerts
|

Improved biomedical document retrieval system with PubMed term statistics and expansions

Abstract: Large biomedical abstract databases such as MEDLINE enable users to search for large bodies of biomedical knowledge quickly. In this study, we describe a new framework to improve the performance of MEDLINE document retrieval. We first analysed and built a normalized term frequency distributions for 1.8 million terms by sampling from 1,500,000 MEDLINE abstracts. Then, we developed a statistical model to identify significantly observed terms ('gists') in a document as additional document keywords to help improve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2009
2009
2010
2010

Publication Types

Select...
3
1

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 8 publications
0
6
0
Order By: Relevance
“…We use a term frequency statistical method described by Li and Chen (2009). This method makes use of term statistical distribution from the entire MEDLINE abstracts to calculate p-value of each term's significance in being observed in any collection of retrieved MEDLINE abstracts T BR .…”
Section: Select Significant Breast Cancer Drugsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use a term frequency statistical method described by Li and Chen (2009). This method makes use of term statistical distribution from the entire MEDLINE abstracts to calculate p-value of each term's significance in being observed in any collection of retrieved MEDLINE abstracts T BR .…”
Section: Select Significant Breast Cancer Drugsmentioning
confidence: 99%
“…In this work, we first calculate a p-value for each drug term t j in T BR using methods described in Li and Chen (2009) and later derive its false discovery rate (FDR). Let the null hypothesis Ho be that document frequency of drug term t j in T BR comes from a random distribution T Random .…”
Section: Select Significant Breast Cancer Drugsmentioning
confidence: 99%
“…The system involving three major components in text mining performed the task of text classification. Li and Chen (2009) also described a new framework to improve the performance of medical document retrieval. Other systems as reported in Mamlin et al (2003) included MedLEE and LifeCode®.…”
Section: Expert Systems For Medical Data Analysismentioning
confidence: 99%
“…We use a term frequency statistical method described by Li and Chen [16]. This method makes use of term statistical distribution from the entire PubMed abstracts to calculate p-value of each term's significance in being observed in any collection of retrieved PubMed abstracts.…”
Section: Build Biogists For Breast Cancer Related Drug Compoundsmentioning
confidence: 99%
“…In this work, we first calculate a p-value for each term in using methods described in [16] and later derive its false discovery rate. Let the null hypothesis Ho be that document frequency of term in come from a random distribution .…”
Section: Build Biogists For Breast Cancer Related Drug Compoundsmentioning
confidence: 99%