2017
DOI: 10.1002/asi.23936
|View full text |Cite
|
Sign up to set email alerts
|

Clinical information extraction using small data: An active learning approach based on sequence representations and word embeddings

Abstract: This article demonstrates the benefits of using sequence representations based on word embeddings to inform the seed selection and sample selection processes in an active learning pipeline for clinical information extraction. Seed selection refers to choosing an initial sample set to label to form an initial learning model. Sample selection refers to selecting informative samples to update the model at each iteration of the active learning process. Compared to supervised machine learning approaches, active lea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 33 publications
0
11
0
Order By: Relevance
“…Since the initial labelled set (i.e. seed set) is an important factor in increasing the performance of AL at early iterations (Kholghi, Vine, Sitbon, Zuccon, & Nguyen, ), it could have influenced the AL performance reported in Qian et al. ().…”
Section: Introductionmentioning
confidence: 99%
“…Since the initial labelled set (i.e. seed set) is an important factor in increasing the performance of AL at early iterations (Kholghi, Vine, Sitbon, Zuccon, & Nguyen, ), it could have influenced the AL performance reported in Qian et al. ().…”
Section: Introductionmentioning
confidence: 99%
“…Although DKI achieved the lowest time rate among AL query strategies, it should be noted that it strongly relies on the availability of the domain knowledge (i.e., less generalizable) compared to unsupervised-based (i.e., ULC, UID, and 2L-UID) and similarity-based approaches (i.e., IDiv) [186]. Another observation is that the time rates in Table 4 show that the actual time savings are much closer to the estimated concept annotation rates than to the sequence and token annotation rates.…”
Section: Discussionmentioning
confidence: 99%
“…We also study the role of a smart seed selection approach in reducing the annotation time from early batches of active learning. Our previous study demonstrated that Longest Sequence Cluster (LSC) can lead to an initial model with significantly higher effectiveness at early batches of AL compared to when using RS [186]. We use LSC and RS seed selection approaches to build two initial models.…”
Section: Objectivementioning
confidence: 99%
See 2 more Smart Citations