Identifying Relationships between Scientific Datasets

Alawini, Abdussalam

doi:10.15760/etd.2918

Cited by 2 publications

(3 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The automatic discovery of relationships between research datasets is of paramount importance for the successful implementation of an exploratory approach to knowledge production [21]. Therefore, the development of software able to discover data relationships to establish interconnections between datasets must be hastened.…”

Section: Data Analyzersmentioning

confidence: 99%

An exploratory approach to data driven knowledge creation

et al. 2023

View full text Add to dashboard Cite

This paper describes a new approach to knowledge creation that is instrumental for the emerging paradigm of data-intensive science. The proposed approach enables the acquisition of new insights from the data by exploiting existing relationships between diverse types of datasets acquired through various modalities. The value of data consistently improves when it can be linked to other data because linking multiple types of datasets allows creating novel data patterns within a scientific data space. These patterns enable the exploratory data analysis, an analysis strategy that emphasizes incremental and adaptive access to the datasets constituting a scientific data space while maintaining an open mind to alternative possibilities of data interconnectivity. A technology, the Linked Open data (LOD), was developed to enable the linking of datasets. We argue that the LOD technology presents several limitations that prevent the full exploitation of this technology to acquire new insights. In this paper, we outline a new approach that enables researchers to dynamically create data patterns in a research data space by exploiting explicit and implicit/hidden relationships between distributed research datasets. This dynamic creation of data patterns enables the exploratory data analysis strategy.

show abstract

Section: Data Analyzersmentioning

confidence: 99%

An exploratory approach to data driven knowledge creation

et al. 2023

View full text Add to dashboard Cite

show abstract

“…REDISCOVER [2] is an example of a proposed solution aimed in this direction. It is based on machine learning techniques, such as Support Vector Machines [8], to identify matching columns between scientific tabular data.…”

Section: Spatialmentioning

confidence: 99%

“…This data may consist of raw files that have not yet been ingested into a database system and for which the schema may be unfamiliar and not adequately documented. Furthermore, the entire data set may be composed of multiple files with heterogeneous schemes for the following reasons: They come from different sources, they were produced without proper guidelines or a combination of both [1], [2], [3].…”

Section: Introductionmentioning

confidence: 99%

Noise Resistant Multidimensional Data Fusion via Quasi-Cliques on Hypergraphs

Ayllón¹,

Palomo-Duarte²,

Dodero³

2021

Preprint

View full text Add to dashboard Cite

Cross-matching data stored on separate files is an everyday activity in the scientific domain. However sometimes the relation between attributes may not be obvious. The discovery of foreign keys on relational databases is a similar problem. Thus techniques devised for this problem can be adapted. Nonetheless, given the different nature of the data, which can be subject to uncertainty, this adaptation is not trivial.<br>This paper firstly introduces the concept of Equally-Distributed Dependencies, which is similar to the Inclusion Dependencies from the relational domain. We describe a correspondence in order to bridge existing ideas. We then propose PresQ: a new algorithm based on the search of maximal quasi-cliques on hyper-graphs to make it more robust to the nature of uncertain numerical data. This algorithm has been tested on three public datasets, showing promising results both in its capacity to find multidimensional equally-distributed sets of attributes and in run-time.

show abstract

Identifying Relationships between Scientific Datasets

Cited by 2 publications

References 42 publications

An exploratory approach to data driven knowledge creation

An exploratory approach to data driven knowledge creation

Noise Resistant Multidimensional Data Fusion via Quasi-Cliques on Hypergraphs

Contact Info

Product

Resources

About