Proceedings of the 27th International Conference on Scientific and Statistical Database Management 2015
DOI: 10.1145/2791347.2791358
|View full text |Cite
|
Sign up to set email alerts
|

Towards automated prediction of relationships among scientific datasets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 9 publications
0
3
0
Order By: Relevance
“…Since the statistics may not be sufficient in determining the relationship, ReConnect asks the user to select a candidate relationship for validation. In order to identify the relationship between all possible pairs in a large collection of datasets, Abdussalam et al [113] further propose a system, entitled ReDiscover, to automate the relationship discovery process without involving user input. ReDiscover first computes column statistics and then feeds them into machine learning models to predict the relationship.…”
Section: Additional Related Workmentioning
confidence: 99%
“…Since the statistics may not be sufficient in determining the relationship, ReConnect asks the user to select a candidate relationship for validation. In order to identify the relationship between all possible pairs in a large collection of datasets, Abdussalam et al [113] further propose a system, entitled ReDiscover, to automate the relationship discovery process without involving user input. ReDiscover first computes column statistics and then feeds them into machine learning models to predict the relationship.…”
Section: Additional Related Workmentioning
confidence: 99%
“…The EUSES corpus collects spreadsheets used as databases, and for financial, grading, homework, inventory, and modeling purposes. EUSES is frequently used by researchers building spreadsheet tools [Alawini et al 2015;Barowy et al 2014Barowy et al , 2015Cheung et al 2016;Grigoreanu et al 2010;Hermans and Dig 2014;Hermans et al 2012aHermans et al , 2010Hermans et al , 2013Hofer et al 2013;Joharizadeh 2015;Le and Gulwani 2014;Muşlu et al 2015;Singh et al 2017]. All of the categories present in EUSES are represented in the CUSTODES suite.…”
Section: About the Benchmarksmentioning
confidence: 99%
“…The second aspect is that these tools focus on single files and they are unaware of the dataflow of a scientific analysis. Data in one file rarely refers to data in other files, and recovering this information without the explicit dataflow is a research question on its own [22]. In our setting we have the explicit dataflow (lineage) in the workflow description and provenance, and this information needs to be augmented with information extracted from data.…”
Section: Introductionmentioning
confidence: 99%