2018
DOI: 10.14778/3229863.3229876
|View full text |Cite
|
Sign up to set email alerts
|

Data integration and machine learning

Abstract: As data volume and variety have increased, so have the ties between machine learning and data integration become stronger. For machine learning to be effective, one must utilize data from the greatest possible variety of sources; and this is why data integration plays a key role. At the same time machine learning is driving automation in data integration, resulting in overall reduction of integration costs and improved accuracy. This tutorial focuses on three aspects of the synergistic relationship between dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(14 citation statements)
references
References 45 publications
0
14
0
Order By: Relevance
“…Next to the elimination of existing errors in the input data, procedures for feature engineering, carried out by data scientists (in cooperation with domain experts), are necessary for other domain-specific ML operations. Data cleaning can be split into three parts [43], where error detection like duplicate data, violations of logical constraints, or incorrect value recognition is the first task. Moreover, solving every detected error is a second operation, and the data imputation supplements the missing and incomplete data as the last step.…”
Section: Data Cleaning and Labelingmentioning
confidence: 99%
“…Next to the elimination of existing errors in the input data, procedures for feature engineering, carried out by data scientists (in cooperation with domain experts), are necessary for other domain-specific ML operations. Data cleaning can be split into three parts [43], where error detection like duplicate data, violations of logical constraints, or incorrect value recognition is the first task. Moreover, solving every detected error is a second operation, and the data imputation supplements the missing and incomplete data as the last step.…”
Section: Data Cleaning and Labelingmentioning
confidence: 99%
“…However, their increased usefulness depends on the widespread adoption of ontologies and metadata standards by data providers, a process that is still underway. A promising approach to overcome these limitations has been to use machine learning techniques to support open data integration activities (Dong & Rekatsinas, 2018;Miller, 2018), such as entity matching (Mudgal et al, 2018;Nargesian, Zhu, Pu, & Miller, 2018). These recently proposed techniques could be leveraged and extended for integrating biodiversity and other related datasets.…”
Section: Biodiversity Informatics Challenges and Concluding Remarksmentioning
confidence: 99%
“…For biomedical data sets, integration can involve standardization by mapping to ontologies with controlled vocabularies [ 43 - 45 ]. Although current approaches use deep learning for integration [ 46 - 50 ], generating a training corpus and validating results require domain expert input. For example, Cui et al [ 35 ] require domain experts to validate data curation efforts for studying sudden death in epilepsy.…”
Section: Challenges In the Data Pipelinementioning
confidence: 99%