Representative clustering of uncertain data

Züfle, Andreas; Emrich, Tobias; Schmid, Klaus Arthur; Mamoulis, Nikos; Zimek, Arthur; Renz, Matthias

doi:10.1145/2623330.2623725

Cited by 27 publications

(9 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As experimental results verify in Figure 6a,b, the above properties have a positive impact on vulnerability which is pair-wise preserved, showing that the clusters occur within cluster space-time similarity. The authors in [22] provide an efficient scheme for representative clustering on uncertain data. Finally, assuming feature suppression, the method with clustering demonstrates higher robustness or lower vulnerability, which is the main issue in k-anonymity, and thus in privacy preservation.…”

Section: Discussionmentioning

confidence: 99%

Storage Efficient Trajectory Clustering and k-NN for Robust Privacy Preserving Spatio-Temporal Databases

et al. 2019

View full text Add to dashboard Cite

The need to store massive volumes of spatio-temporal data has become a difficult task as GPS capabilities and wireless communication technologies have become prevalent to modern mobile devices. As a result, massive trajectory data are produced, incurring expensive costs for storage, transmission, as well as query processing. A number of algorithms for compressing trajectory data have been proposed in order to overcome these difficulties. These algorithms try to reduce the size of trajectory data, while preserving the quality of the information. In the context of this research work, we focus on both the privacy preservation and storage problem of spatio-temporal databases. To alleviate this issue, we propose an efficient framework for trajectories representation, entitled DUST (DUal-based Spatio-temporal Trajectory), by which a raw trajectory is split into a number of linear sub-trajectories which are subjected to dual transformation that formulates the representatives of each linear component of initial trajectory; thus, the compressed trajectory achieves compression ratio equal to M : 1. To our knowledge, we are the first to study and address k-NN queries on nonlinear moving object trajectories that are represented in dual dimensional space. Additionally, the proposed approach is expected to reinforce the privacy protection of such data. Specifically, even in case that an intruder has access to the dual points of trajectory data and try to reproduce the native points that fit a specific component of the initial trajectory, the identity of the mobile object will remain secure with high probability. In this way, the privacy of the k-anonymity method is reinforced. Through experiments on real spatial datasets, we evaluate the robustness of the new approach and compare it with the one studied in our previous work. established as trajectory or mobility mining [1]. Also, the technology of databases is evolving to support the querying and representation of the trajectory of moving objects (e.g., humans, animals, vehicles, natural phenomena). Hence, the main parts of trajectory data-mining include pre-processing, data management, query processing, trajectory data-mining tasks, and privacy protection [2].Real-life applications, such as the analysis of traffic congestion, intelligent transportation, animal immigration habits analysis, cellular communications, military applications, structural and environmental monitoring, disaster/rescue management, as well as remediation, Geographic Information Systems (GIS), Location-Based Services (LBS), and other domains have increased the interest in the area of trajectory data-mining and efficient management of spatio-temporal data.It should be noted that the explosive growth of social media has produced large-scale mobility datasets whose publication puts people's personal lives at severe risk. Indeed, users get used to sharing their most-visited or potentially sensitive locations, such as their home, workplace, and holiday locations that are easy to obtain through social media. No...

show abstract

Section: Discussionmentioning

confidence: 99%

Storage Efficient Trajectory Clustering and k-NN for Robust Privacy Preserving Spatio-Temporal Databases

et al. 2019

View full text Add to dashboard Cite

show abstract

“…There is no uncertainties in the original UCR datasets, we need to construct uncertainties based on the method mentioned in [14] first. The uncertainties can be described by the samples representing the possible values, so we choose Gaussian distribution to generate samples for each object in dataset .…”

Section: Methodsmentioning

confidence: 99%

A Similarity Between Uncertain Data Measurement Method Based on stochastic simulation

Cheng

Chi

Lang

2020

Proceedings of the 13th EAI International Conference on Mobile Multimedia Communications, Mobimedia 2020, 27-28 August 2020, Cy

View full text Add to dashboard Cite

The distance measurement between uncertain data is an important basis for accurate clustering. Taking full advantage of the uncertainty characteristics of the object will help to represent the uncertain data more accurately and calculate its distance. Based on the probability distribution function to represent the characteristics of uncertainty distribution, this paper studies a method for measuring distance between uncertain objects based on stochastic simulation. The effectiveness of the proposed method is verified by experiments.

show abstract

“…For many application domains, the ability to unearth valuable knowledge from a dataset is impaired by unreliable, erroneous, obsolete, imprecise, and noisy data (Schubert et al 2015;Züfle et al 2014)-or, in other words, uncertain data that is commonly described by a probability distribution (Jiang et al 2013;Pei et al 2007). Uncertain data are found in modeling situations where a mathematical model only approximates the actual nonconforming quality control process.…”

Section: Uncertain Data Clusteringmentioning

confidence: 99%

Complexity Analysis Approach for Prefabricated Construction Products Using Uncertain Data Clustering

AbouRizk

Zaı̈ane

et al. 2018

J. Constr. Eng. Manage.

View full text Add to dashboard Cite

This paper proposes an uncertain data clustering approach to quantitatively analyze the complexity of prefabricated construction components through the integration of quality performance-based measures with associated engineering design information. The proposed model is constructed in three steps, which (1) measure prefabricated construction product complexity (hereafter referred to as product complexity) by introducing a Bayesian-based nonconforming quality performance indicator; (2) score each type of product complexity by developing a Hellinger distance-based distribution similarity measurement; and (3) cluster products into homogeneous complexity groups by using the agglomerative hierarchical clustering technique. An illustrative example is provided to demonstrate the proposed approach, and a case study of an industrial company in Edmonton, Canada, is conducted to validate the feasibility and applicability of the proposed model. This research inventively defines and investigates product complexity from the perspective of product quality performance with design information associated. The research outcomes provide simplified, interpretable, and informative insights for practitioners to better analyze and manage product complexity. In addition to this practical contribution, a novel hierarchical clustering technique is devised. This technique is capable of clustering uncertain data (i.e., beta distributions) with lower computational complexity and has the potential to be generalized to cluster all types of uncertain data.

show abstract

Representative clustering of uncertain data

Cited by 27 publications

References 51 publications

Storage Efficient Trajectory Clustering and k-NN for Robust Privacy Preserving Spatio-Temporal Databases

Storage Efficient Trajectory Clustering and k-NN for Robust Privacy Preserving Spatio-Temporal Databases

A Similarity Between Uncertain Data Measurement Method Based on stochastic simulation

Complexity Analysis Approach for Prefabricated Construction Products Using Uncertain Data Clustering

Contact Info

Product

Resources

About