Abstract. With the exponential growth of moving objects data to the Gigabyte range, it has become critical to develop effective techniques for indexing, updating, and querying these massive data sets. To meet the high update rate as well as low query response time requirements of moving object applications, this paper takes a novel approach in moving object indexing. In our approach we do not require a sophisticated index structure that needs to be adjusted for each incoming update. Rather we construct conceptually simple short-lived throwaway indexes which we only keep for a very short period of time (sub-seconds) in main memory. As a consequence, the resulting technique MOVIES supports at the same time high query rates and high update rates and trades this for query result staleness. Moreover, MOVIES is the first main memory method supporting time-parameterized predictive queries. To support this feature we present two algorithms: non-predictive MOVIES and predictive MOVIES. We obtain the surprising result that a predictive indexing approach -considered state-of-the-art in an external-memory scenario -does not scale well in a main memory environment. In fact our results show that MOVIES outperforms state-of-the-art moving object indexes like a main-memory adapted B x -tree by orders of magnitude w.r.t. update rates and query rates. Finally, our experimental evaluation uses a workload unmatched by any previous work. We index the complete road network of Germany consisting of 40,000,000 road segments and 38,000,000 nodes. We scale our workload up to 100,000,000 moving objects, 58,000,000 updates per second and 10,000 queries per second which is unmatched by any previous work.
With the exponential growth of moving objects data to the Gigabyte range, it has become critical to develop effective techniques for indexing, updating, and querying these massive data sets. To meet the high update rate as well as low query response time requirements of moving object applications, this paper takes a novel approach in moving object indexing. In our approach, we do not require a sophisticated index structure that needs to be adjusted for each incoming update. Rather, we construct conceptually simple short-lived index images that we only keep for a very short period of time (sub-seconds) in main memory. As a consequence, the resulting technique MOVIES supports at the same time high query rates and high update rates, trading this property for query result staleness. Moreover, MOVIES is the first main memory method supporting time-parameterized predictive queries. To support this feature, we present two algorithms: non-predictive MOVIES and predictive MOVIES. We obtain the surprising result that a predictive indexing approachconsidered state-of-the-art in an external-memory scenario-does not scale well in a main memory environment. In fact, our results show that MOVIES outperforms stateof-the-art moving object indexes such as a main-memory adapted B x -tree by orders of magnitude w.r.t. update rates and query rates. In our experimental evaluation, we index the complete road network of Germany consisting of 40,000,000 road segments and 38,000,000 nodes. We scale our workload up to 100,000,000 moving objects, 58,000,000 updates per second and 10,000 queries per second, a scenario at a scale unmatched by any previous work.
The joint EDBT/ICDT conference (International Conference on Extending Database Technology / International Conference on Database Theory) is a well established conference series on data management, with annual meetings in the second half of March that attract 250 to 300 delegates. Three weeks before EDBT/ICDT 2020 was planned to take place in Copenhagen, the rapidly developing Covid-19 pandemic led to the decision to cancel the face-to-face event. In the interest of the research community, it was decided to move the conference online while trying to preserve as much of the real-life experience as possible. As far as we know, we are one of the first conferences that moved to a fully synchronous online experience due to the COVID- 19 outbreak. By fully synchronous, we mean that participants jointly listened to presentations, had live Q&A, and attended other live events associated with the conference. In this report, we share our decisions, experiences, and lessons learned.
Massive amounts of satellite data have been gathered over time, holding the potential to unveil a spatiotemporal chronicle of the surface of Earth. These data allow scientists to investigate various important issues, such as land use changes, on a global scale. However, not all land-use phenomena are equally visible on satellite imagery. In particular, the creation of an inventory of the planet's road infrastructure remains a challenge, despite being crucial to analyze urbanization patterns and their impact. Towards this end, this work advances datadriven approaches for the automatic identification of roads based on open satellite data. Given the typical resolutions of these historical satellite data, we observe that there is inherent variation in the visibility of different road types. Based on this observation, we propose two deep learning frameworks that extend state-ofthe-art deep learning methods by formalizing road detection as an ordinal classification task. In contrast to related schemes, one of the two models also resorts to satellite time series data that are potentially affected by missing data and cloud occlusion. Taking these time series data into account eliminates the need to manually curate datasets of high-quality image tiles, substantially simplifying the application of such models on a global scale. We evaluate our approaches on a dataset that is based on Sentinel 2 satellite imagery and OpenStreetMap vector data. Our results indicate that the proposed models can successfully identify large and medium-sized roads. We also discuss opportunities and challenges related to the detection of roads and other infrastructure on a global scale.
Microservices have become a popular architectural style for data-driven applications, given their ability to functionally decompose an application into small and autonomous services to achieve scalability, strong isolation, and specialization of database systems to the workloads and data formats of each service. Despite the accelerating industrial adoption of this architectural style, an investigation of the state of the practice and challenges practitioners face regarding data management in microservices is lacking. To bridge this gap, we conducted a systematic literature review of representative articles reporting the adoption of microservices, we analyzed a set of popular open-source microservice applications, and we conducted an online survey to cross-validate the findings of the previous steps with the perceptions and experiences of over 120 experienced practitioners and researchers. Through this process, we were able to categorize the state of practice of data management in microservices and observe several foundational challenges that cannot be solved by software engineering practices alone, but rather require system-level support to alleviate the burden imposed on practitioners. We discuss the shortcomings of state-of-the-art database systems regarding microservices and we conclude by devising a set of features for microservice-oriented database systems.
Online-Analytical Processing (OLAP) has been a field of competing technologies for the past ten years. One of the still unsolved challenges of OLAP is how to provide quick response times on any Terabyte-sized business data problem. Recently, a very clever multi-dimensional index structure termed Dwarf [26] has been proposed offering excellent query response times as well as unmatched index compression rates. The proposed index seems to scale well for both large data sets as well as high dimensions. Motivated by these surprisingly excellent results, we take a look into the rearview mirror. We have re-implemented the Dwarf index from scratch and make three contributions. First, we successfully repeat several of the experiments of the original paper. Second, we substantially correct some of the experimental results reported by the inventors. Some of our results differ by orders of magnitude. To better understand these differences, we provide additional experiments that better explain the behavior of the Dwarf index. Third, we provide missing experiments comparing Dwarf to baseline query processing strategies. This should give practitioners a better guideline to understand for which cases Dwarf indexes could be useful in practice. 1 Using a star schema is already an optimization: it simply replaces all non-fact tables by materialized views.
Abstract-Dataspace applications necessitate the creation of associations among data items over time. For example, once information about people is extracted from sources such as webpages and blogs, associations among them may emerge as a consequence of different criteria, such as their city of origin, their elected hobbies, or their age group. In a set of personal data sources, we may wish to associate documents and emails based on their modification dates or their authors. In this paper, we advocate a declarative approach to specifying these associations. We propose that each set of associations be defined by an association trail. An association trail is a query-based definition of how items are connected by intensional (i.e., virtual) association edges to other items in the dataspace. The benefit of this mechanism is the creation of an intensional graph of associations among previously disconnected data items coming from different data sources.We study in detail the problem of processing neighborhood queries over these intensional association graphs. The naive approach to neighborhood query processing over intensional graphs is to materialize the whole graph and then apply previous work on dataspace graph indexing to answer queries. As the intensional graph may have a number of edges quadratic in its number of nodes, the naive approach has worst-case quadratic indexing cost. We develop in this paper a novel indexing technique, the grouping-compressed index (GCI), that exploits association trail definitions to materialize the same intensional graph with linear cost. In addition, we present a query answering algorithm over GCI that avoids decompressing the graph to its quadratic size. In our experimental evaluation, GCI is shown to provide an order of magnitude gain in indexing cost over the naive approach, while remaining competitive in query processing time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.