Data Collaboration Analysis Framework Using Centralization of Individual Intermediate Representations for Distributed Data Sets

Imakura, Akira; Sakurai, Tetsuya

doi:10.1061/ajrua6.0001058

Cited by 18 publications

(35 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we describe one type of data collaboration analysis proposed in [13]. A schematic illustration is shown in Fig.…”

Section: Data Collaboration Analysis With Anchor Datamentioning

confidence: 99%

“…In [13], the following method to estimate Z and g by solving the minimal perturbation problem was proposed:…”

Section: Data Collaboration Analysis With Anchor Datamentioning

confidence: 99%

“…In the framework of data collaboration analysis [13], our method can be regarded as using f for mapping from the original data to the distance matrix between the original data and the anchor data, and g for mapping from the integrated graph that is expected to have the top-k nearest neighbors of all nodes to the collaboration representation.…”

Section: Basic Conceptmentioning

confidence: 99%

“…We applied our method to two popular classification datasets in the UCI machine learning repository [9] and compared the results with those by the method proposed in [13] (called the SVD-based method). We conducted experiments where we used the Iris dataset, which contains 150 samples in a four-dimensional dataset to classify into three classes, and the Breast Cancer Wisconsin (original) dataset, which contains 699 samples in a nine-dimensional dataset to classify into two classes.…”

Section: Experimental Settingsmentioning

confidence: 99%

“…A data collaboration analysis method that safely integrates distributed data without using encryption was recently proposed [13]. Even if this method does not use cryptography, it can reduce the risk of estimating the original data.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Privacy-Preserving Data Sharing by Integrating Perturbed Distance Matrices

Chang

Ando

2020

SN COMPUT. SCI.

View full text Add to dashboard Cite

Collecting large amounts of data is beneficial in machine learning to generate models that are less biased. There are many cases in which pieces of similar data are distributed among organizations, and it is difficult to integrate these data owing to issues involving privacy and cost. Integrating these distributed data without delivering the original data leads to the concept of data collaboration, which combines data held by different organizations in a secure manner. We propose a method in which a distance matrix of the original data obtained using common data among organizations is shared to learn neighbor information of the original data. Specifically, the proposed method robustly integrates distributed data, which is of as good quality as connected raw data, in cases where the amount of data in each organization is small and the data bias is large. In addition, the proposed method is applicable to data contaminated by noise. To demonstrate the effectiveness of the proposed method, we performed a classification task on open biological data divided into several pieces and found that the classification results for divided data were as precise as when all data were available. Finally, we show that the robustness of the method against noise improves the anonymity of the original data as a by-product.

show abstract

“…In this section, we describe one type of data collaboration analysis proposed in [13]. A schematic illustration is shown in Fig.…”

Section: Data Collaboration Analysis With Anchor Datamentioning

confidence: 99%

“…In [13], the following method to estimate Z and g by solving the minimal perturbation problem was proposed:…”

Section: Data Collaboration Analysis With Anchor Datamentioning

confidence: 99%

Section: Basic Conceptmentioning

confidence: 99%

Section: Experimental Settingsmentioning

confidence: 99%

Section: Introductionmentioning