Cluster analysis over Moving Object Databases (MODs) is a challenging research topic that has attracted the attention of the mobility data mining community. In this paper, we study the temporal-constrained sub-trajectory cluster analysis problem, where the aim is to discover clusters of sub-trajectories given an ad-hoc, user-specified temporal constraint within the dataset's lifetime. The problem is challenging because: (a) the time window is not known in advance, instead it is specified at query time, and (b) the MOD is continuously updated with new trajectories. Existing solutions first filter the trajectory database according to the temporal constraint, and then apply a clustering algorithm from scratch on the filtered data. However, this approach is extremely inefficient, when considering explorative data analysis where multiple clustering tasks need to be performed over different temporal subsets of the database, while the database is updated with new trajectories. To address this problem, we propose an incremental and scalable solution to the problem, which is built upon a novel indexing structure, called Representative Trajectory Tree (ReTraTree). ReTraTree acts as an effective spatiotemporal partitioning technique; partitions in ReTraTree correspond to groupings of sub-trajectories, which are incrementally maintained and assigned to representative (sub-)trajectories. Due to the proposed organization of sub-trajectories, the problem under study can be efficiently solved as simply as executing a query operator on ReTraTree, while insertion of new trajectories is supported. Our extensive experimental study performed on real and synthetic datasets shows that our approach outperforms a state-of-the-art in-DBMS solution supported by PostgreSQL by orders of magnitude.
Joining trajectory datasets is a significant operation in mobility data analytics and the cornerstone of various methods that aim to extract knowledge out of them. In the era of Big Data, the production of mobility data has become massive and, consequently, performing such an operation in a centralized way is not feasible. In this paper, we address the problem of Distributed Subtrajectory Join processing by utilizing the MapReduce programming model. Compared to traditional trajectory join queries, this problem is even more challenging since the goal is to retrieve all the "maximal" portions of trajectories that are "similar". We propose three solutions: (i) a well-designed basic solution, coined DTJb, (ii) a solution that uses a preprocessing step that repartitions the data, labeled DTJr , and (iii) a solution that, additionally, employs an indexing scheme, named DTJi . In our experimental study, we utilize a 56GB dataset of real trajectories from the maritime domain, which, to the best of our knowledge, is the largest real dataset used for experimentation in the literature of trajectory data management. The results show that DTJi performs up to 16× faster compared with DTJb, 10× faster than DTJr and 3× faster than the closest related state of the art algorithm.
Abstract. During the last decade, the domain of mobility data mining has emerged providing many effective methods for the discovery of intuitive patterns representing collective behavior of trajectories of moving objects. Although a few real-world trajectory datasets have been made available recently, these are not sufficient for experimentally evaluating the various proposals, therefore, researchers look to synthetic trajectory generators. This case is problematic because, on the one hand, real datasets are usually small, which compromises scalability experiments, and, on the other hand, synthetic dataset generators have not been designed to produce mobility pattern driven trajectories. Motivated by this observation, we present Hermoupolis, an effective generator of synthetic trajectories of moving objects that has the main objective that the resulting datasets support various types of mobility patterns (clusters, flocks, convoys, etc.), as such producing datasets with available ground truth information.
Trajectory clustering is an important operation of knowledge discovery from mobility data. Especially nowadays, the need for performing advanced analytic operations over massively produced data, such as mobility traces, in efficient and scalable ways is imperative. However, discovering clusters of complete trajectories can overlook significant patterns that exist only for a small portion of their lifespan. In this paper, we address the problem of Distributed Subtrajectory Clustering in an efficient and highly scalable way. The problem is challenging because the subtrajectories to be clustered are not known in advance, but they need to be discovered dynamically based on adjacent subtrajectories in space and time. Towards this objective, we split the original problem to three sub-problems, namely Subtrajectory Join, Trajectory Segmentation and Clustering and Outlier Detection, and deal with each one in a distributed fashion by utilizing the MapReduce programming model. The efficiency and the effectiveness of our solution is demonstrated experimentally over a synthetic and two large real datasets from the maritime and urban domains and through comparison with two state of the art subtrajectory clustering algorithms.
In this paper, we present an efficient in-DBMS framework for progressive time-aware sub-trajectory cluster analysis. In particular, we address two variants of the problem: (a) spatiotemporal sub-trajectory clustering and (b) index-based time-aware clustering at querying environment. Our approach for (a) relies on a two-phase process: a voting-and-segmentation phase followed by a sampling-and-clustering phase. Regarding (b), we organize data into partitions that correspond to groups of sub-trajectories, which are incrementally maintained in a hierarchical structure. Both approaches have been implemented in Hermes@PostgreSQL, a real Moving Object Database engine built on top of PostgreSQL, enabling users to perform progressive cluster analysis via simple SQL. The framework is also extended with a Visual Analytics (VA) tool to facilitate real world analysis.
During the last decades a number of effective methods for indexing, query processing, and knowledge discovery in moving object databases have been proposed. An interesting research direction that has recently emerged handles semantics of movement instead of raw spatio-temporal data. Semantic annotations, such as 'stop', 'move', 'at home', 'shopping', 'driving', etc., are either declared by the users (e.g. through social network apps) or automatically inferred by some annotation method and are typically presented as textual counterparts along with spatial and temporal information of raw trajectories. It is natural to argue that such 'spatio-temporal-textual' sequences, called semantic trajectories, form a realistic representation model of the complex everyday life (hence, mobility) of individuals. Towards handling semantic trajectories of moving objects in Semantic Mobility Databases (SMD), the lack of real datasets leads to the need of designing realistic simulators. In the context of the above discussion, the goal of this work is to realistically simulate the mobility life of a large-scale population of moving objects in an urban environment. Two simulator variations are presented: the core Hermoupolis simulator is parametric-driven (i.e., user-defined parameters tune every movement aspect), whereas the expansion of the former, called Hermoupolis by-example , follows the generate-by-example paradigm and is self-tuned by looking inside a real small (sample) dataset. We stress test our proposal and demonstrate its novel characteristics with respect to related work.
In this paper, we present a Big data framework for the prediction of streaming trajectory data by exploiting mined patterns of trajectories, allowing accurate long-term predictions with low latency. In particular, to meet this goal we follow a two-step methodology. First, we efficiently identify the hidden mobility patterns in an offline manner. Subsequently, the trajectory prediction algorithm exploits these patterns in order to prolong the temporal horizon of useful predictions. The experimental study is based on real-world aviation and maritime datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.