2009
DOI: 10.1186/1471-2105-10-s9-s12
|View full text |Cite
|
Sign up to set email alerts
|

Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease

Abstract: Background: Natural Language Processing (NLP) systems can be used for specific Information Extraction (IE) tasks such as extracting phenotypic data from the electronic medical record (EMR). These data are useful for translational research and are often found only in free text clinical notes. A key required step for IE is the manual annotation of clinical corpora and the creation of a reference standard for (1) training and validation tasks and (2) to focus and clarify NLP system requirements. These tasks are t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 38 publications
(29 citation statements)
references
References 10 publications
0
29
0
Order By: Relevance
“…Other existing biomedical datasets annotate only diseases; they include the NCBI disease corpus [4] which consists of 793 PubMed abstracts with 6,892 disease mentions and 790 unique disease concepts mapped to the Medical Subject Headings (MeSH), 11 and the Arizona Disease Corpus (AZDC) [9] which contains 2,784 sentences from MEDLINE abstracts annotated with disease mentions and mapped to the Unified Medical Language System (UMLS) 12 . Symptom recognition [10] is a relatively new task, often included in more general categories such as clinical concepts [19], medical problems [18] or phenotypic information [16]. Even on these categories, few studies take advantage of considering the linguistic context in which symptoms appear, and they are more focused on the linguistic analysis.…”
Section: Related Workmentioning
confidence: 99%
“…Other existing biomedical datasets annotate only diseases; they include the NCBI disease corpus [4] which consists of 793 PubMed abstracts with 6,892 disease mentions and 790 unique disease concepts mapped to the Medical Subject Headings (MeSH), 11 and the Arizona Disease Corpus (AZDC) [9] which contains 2,784 sentences from MEDLINE abstracts annotated with disease mentions and mapped to the Unified Medical Language System (UMLS) 12 . Symptom recognition [10] is a relatively new task, often included in more general categories such as clinical concepts [19], medical problems [18] or phenotypic information [16]. Even on these categories, few studies take advantage of considering the linguistic context in which symptoms appear, and they are more focused on the linguistic analysis.…”
Section: Related Workmentioning
confidence: 99%
“…H & P NOTEs had a similar number of average sections, and an equally large standard deviation of 13.88. The document types with the lowest average number of sections (2) were NO SHOW NOTE (stdev = 2.81), GROUP COUNSELING NOTE (stdev = 1.38), IMMUNIZATION NOTE (stdev = 1.26), and SCANNED NOTE (stdev = 0.66).…”
Section: Document Structure Analysis (Section)mentioning
confidence: 99%
“…A number of studies have applied Natural Language Processing (NLP) techniques to VistA free text data [1][2][3][4][5][6][7], with promising results. These studies have, however, only explored a very small fraction of the vast amount of VistA notes in terms of domain and facility coverage.…”
Section: Introductionmentioning
confidence: 99%
“…In the medical domain, Knowtator (Ogren 2006) has been most often used (Ogren et al 2008;South et al 2009;Roberts et al 2009). There exist several available open-source tools for manually annotating text corpora.…”
Section: Annotation Toolsmentioning
confidence: 99%