Collections of time-series data appear in a wide variety of contexts. To gain insight into the underlying phenomenon (that the data represents), one must analyze the time-series data. Analysis can quickly become challenging for very large data (∼terabytes or more) sets, and it may be infeasible to scan the entire data-set on each query due to time limits or resource constraints. To avoid this problem, one might pre-compute partial results by scanning the data-set (usually as the data arrives). However, for complex queries, where the value of a new data record depends on all of the data previously seen, this might be infeasible because incorporating a large amount of historical data into a query requires a large amount of storage.We present an approach to performing complex queries over very large data-sets in a manner that is (i) practical, meaning that a query does not require a scan of the entire data-set, and (ii) fixed-cost, meaning that the amount of storage required only depends on the time-range spanned by the entire data-set (and not the size of the data-set itself). We evaluate our approach with three different data-sets: (i) a 4-year commercial analytics data-set from a production content-delivery platform with over 15 million mobile users, (ii) an 18-year data-set from the Linuxkernel commit-history, and (iii) an 8-day data-set from Common Crawl HTTP logs. Our evaluation demonstrates the feasibility and practicality of our approach for a diverse set of complex queries on a diverse set of very large data-sets.
There is no consolidated, integral, quantifiable, granular, updated source of information for the roads we traverse and the environment we live in everyday. This leads to ambiguity about road conditions, which is tolerable during normal conditions but extremely problematic in adverse conditions such as snow blockages, water-logging due to storms, degraded roads and potholes. Without such knowledge, city authorities cannot take effective action against such problems. Also, one only has knowledge about ones immediate surroundings in a car, and not what to expect further down the road. Our approach is to deploy a number of embedded modules capable of sensing, computing and reporting, each of which can simply be plugged into any vehicle. Hence this enables each vehicle's connectivity to the cloud and larger coverage as compared to static sensors. The data reported by each module itself might be prone to errors. Therefore, the cloud crowd sources the data from these modules and merges it to increase confidence in the information. Our work, Panoptes, demonstrates these aspects through crowdsourced pothole detection for city roads.
YinzCam is a cloud-hosted service that provides sports fans with real-time scores, news, photos, statistics, live radio, streaming video, etc., on their mobile devices. YinzCam’s infrastructure is currently hosted on Amazon Web Services (AWS) and supports over 30 million installs of the official mobile apps of 140+ NHL/NFL/NBA/NRL/NCAA sports teams and venues. YinzCam’s workload is necessarily multi-modal (e.g., pre-game, in-game, post-game, game-day, non-gameday), with normal game-time traffic being twenty-fold of that on non-game days. This paper describes the evolution of YinzCam’s production architecture and distributed infrastructure, from its beginnings in 2009, when it was used to support thousands of concurrent users, to today’s system that supports millions of concurrent users on any game day. We also discuss key new opportunities to improve the fan experience inside the stadium of the future, without impacting the available bandwidth, by crowd-sourcing the thousands of mobile devices that are in fans’ hands inside these venues. We present Krowd, a novel distributed key-value store for promoting efficient content sharing, discovery and retrieval across the mobile devices inside a stadium. We present CHIPS, a system that ensures that users’ privacy is maintained while their devices participate in the crowdsourced infrastructure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.