2000
DOI: 10.15760/etd.2918
|View full text |Cite
|
Sign up to set email alerts
|

Identifying Relationships between Scientific Datasets

Abstract: Scientific datasets associated with a research project can proliferate over time as a result of activities such as sharing datasets among collaborators, extending existing datasets with new measurements, and extracting subsets of data for analysis. As such datasets begin to accumulate, it becomes increasingly difficult for a scientist to keep track of their derivation history, which complicates data sharing, provenance tracking, and scientific reproducibility. Understanding what relationships exist between dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 42 publications
0
3
0
Order By: Relevance
“…The automatic discovery of relationships between research datasets is of paramount importance for the successful implementation of an exploratory approach to knowledge production [21]. Therefore, the development of software able to discover data relationships to establish interconnections between datasets must be hastened.…”
Section: Data Analyzersmentioning
confidence: 99%
“…The automatic discovery of relationships between research datasets is of paramount importance for the successful implementation of an exploratory approach to knowledge production [21]. Therefore, the development of software able to discover data relationships to establish interconnections between datasets must be hastened.…”
Section: Data Analyzersmentioning
confidence: 99%
“…REDISCOVER [2] is an example of a proposed solution aimed in this direction. It is based on machine learning techniques, such as Support Vector Machines [8], to identify matching columns between scientific tabular data.…”
Section: Spatialmentioning
confidence: 99%
“…This data may consist of raw files that have not yet been ingested into a database system and for which the schema may be unfamiliar and not adequately documented. Furthermore, the entire data set may be composed of multiple files with heterogeneous schemes for the following reasons: They come from different sources, they were produced without proper guidelines or a combination of both [1], [2], [3].…”
Section: Introductionmentioning
confidence: 99%