Proceedings of the 18th BioNLP Workshop and Shared Task 2019
DOI: 10.18653/v1/w19-5026
|View full text |Cite
|
Sign up to set email alerts
|

Is artificial data useful for biomedical Natural Language Processing algorithms?

Abstract: A major obstacle to the development of Natural Language Processing (NLP) methods in the biomedical domain is data accessibility. This problem can be addressed by generating medical data artificially. Most previous studies have focused on the generation of short clinical text, and evaluation of the data utility has been limited. We propose a generic methodology to guide the generation of clinical text with key phrases. We use the artificial data as additional training data in two key biomedical NLP tasks: text … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 25 publications
0
9
0
Order By: Relevance
“…In the end the decision about which metric to use in such cases depends on the gain from not missing out on the minority classes that may cost a small drop in the majority classes (which may still end up with relative high performance) that the system owner should weigh. Further, we evaluated the classifier performance on the generated sentences alone (following (Wang et al, 2019)), without the train set, and found that micro accuracy falls by 17.5% and macro accuracy by 7.9%. This metric represents how well the generated dataset represents the train set.…”
Section: Balagen Improving Real-life Suc Corporamentioning
confidence: 99%
“…In the end the decision about which metric to use in such cases depends on the gain from not missing out on the minority classes that may cost a small drop in the majority classes (which may still end up with relative high performance) that the system owner should weigh. Further, we evaluated the classifier performance on the generated sentences alone (following (Wang et al, 2019)), without the train set, and found that micro accuracy falls by 17.5% and macro accuracy by 7.9%. This metric represents how well the generated dataset represents the train set.…”
Section: Balagen Improving Real-life Suc Corporamentioning
confidence: 99%
“…The generation of synthetic EHR text for use in medical NLP is still at an early stage [3]. Most studies focus on the creation of English EHR text, using hospital discharge summaries from the MIMIC-III database [7,8,13,14]. In addition, a corpus of English Mental Health Records was explored [15].…”
Section: Generating Synthetic Ehr Notesmentioning
confidence: 99%
“…In addition, a corpus of English Mental Health Records was explored [15]. Unlike the mixed healthcare data used in this study, these EHR notes have a more consistent, template-like structure and contain medical jargon, lending itself to clinical/biomedical downstream tasks found in related work [8,[13][14][15]. Most of these studies focused on classification downstream tasks.…”
Section: Generating Synthetic Ehr Notesmentioning
confidence: 99%
See 2 more Smart Citations