2017
DOI: 10.1007/978-3-319-69775-8_9
|View full text |Cite
|
Sign up to set email alerts
|

Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline

Abstract: Abstract. From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of digital information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(18 citation statements)
references
References 56 publications
0
18
0
Order By: Relevance
“…It has been identified that text ambiguity, lack of resources, complex nested entities, identification of contextual information, noise in the form of homonyms, language variability and missing data are important challenges in entity recognition from unstructured big data [11,16,105]. It is also found that the volume of unstructured big data changed the technological paradigm from traditional rule-based or learning-based techniques to [9,10].…”
Section: Named Entity Recognition (Ner)mentioning
confidence: 99%
See 3 more Smart Citations
“…It has been identified that text ambiguity, lack of resources, complex nested entities, identification of contextual information, noise in the form of homonyms, language variability and missing data are important challenges in entity recognition from unstructured big data [11,16,105]. It is also found that the volume of unstructured big data changed the technological paradigm from traditional rule-based or learning-based techniques to [9,10].…”
Section: Named Entity Recognition (Ner)mentioning
confidence: 99%
“…These techniques extract entity mentions from the text, clusters the similar entities and identify relations [120]. In this case, intensive data preprocessing will be required for big data because unstructured big data sets have missing values, noise and other errors [16] that produce uninformative as well as incoherent extractions. Semi-supervised techniques use both labeled and unlabeled corpus with small degree of supervision [121].…”
Section: Rule-based Approaches Learning-based Approachesmentioning
confidence: 99%
See 2 more Smart Citations
“…The question of the quality of medical record and of the data extracted from there is still understudied [81,10], let alone in regard to machine learning projects [27].…”
Section: Between Gold Standards and Ghost Standardsmentioning
confidence: 99%