Schema-agnostic indexing with Azure DocumentDB

Shukla, Dharma; Thota, Shireesh; Raman, Karthik; Gajendran, Madhan; Shah, Abdul Qadir; Ziuzin, Sergii; Sundaram, Krishnan; Guajardo, Miguel Gonzalez; Wawrzyniak, Anna; Boshra, Samer; Ferreira, Renato; Nassar, Mohamed; Koltachev, Michael; Ji, Huang; Sengupta, Sudipta; Levandoski, Justin J.; Lomet, David

doi:10.14778/2824032.2824065

Cited by 32 publications

(15 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A MongoDB [ 67 ] distributed database has been used for persistent data storage. We have used MongoDB Atlas [ 68 ] and Azure DocumentDB [ 69 ]. While both of them had a very good response time (less than 1 ms), Document DB proved to be very expensive as it is charged per request.…”

Section: Discussion and Resultsmentioning

confidence: 99%

A Model for the Remote Deployment, Update, and Safe Recovery for Commercial Sensor-Based IoT Systems

Radovici

Culic

Rosner

et al. 2020

Sensors

View full text Add to dashboard Cite

Internet of Things (IoT) systems deployments are becoming both ubiquitous and business critical in numerous business verticals, both for process automation and data-driven decision-making based on distributed sensors networks. Beneath the simplicity offered by these solutions, we usually find complex, multi-layer architectures—from hardware sensors up to data analytics systems. These rely heavily on software running on the on-location gateway devices designed to bridge the communication between the sensors and the cloud. This will generally require updates and improvements—raising deployment and maintenance challenges. Especially for large scale commercial solutions, a secure and fail-safe updating system becomes crucial for a successful IoT deployment. This paper explores the specific challenges for infrastructures dedicated to remote application deployment and management, addresses the management challenges related to IoT sensors systems, and proposes a mathematical model and a methodology for tackling this. To test the model’s efficiency, we implemented it as a software infrastructure system for complete commercial IoT products. As proof, we present the deployment of 100 smart soda dispensing machines in three locations. Each machine relies on sensors monitoring its status and on gateways controlling its behaviour, each receiving 133 different remote software updates through our solution. In addition, 80% of the machines ran non-interrupted for 250 days, with 20% failing due to external factors; out of the 80%, 30% experienced temporary update failures due to reduced hardware capabilities and the system successfully performed automatic rollback of the system, thus recovering in 100% of the temporary failures

show abstract

Section: Discussion and Resultsmentioning

confidence: 99%

A Model for the Remote Deployment, Update, and Safe Recovery for Commercial Sensor-Based IoT Systems

Radovici

Culic

Rosner

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…We begin with a review of existing CAS indexes [8,20,23,25,39]. IndexFabric [8] prioritizes the structure of the data over its values.…”

Section: Related Workmentioning

confidence: 99%

“…As real-world BOMs grow to tens of millions of nodes [11], we need dedicated CAS access methods to support the efficient processing of CAS queries. Existing CAS indexes often lead to large intermediate results, since they either build separate indexes for, respectively, content and structure [25] or prioritize one dimension over the other (i.e., content over structure or vice versa) [2,8,39]. We propose a well-balanced integration of paths and values in a single index that provides robust performance for CAS queries, meaning that the index prioritizes neither paths nor values.…”

Section: Introductionmentioning

confidence: 99%

Dynamic interleaving of content and structure for robust indexing of semi-structured hierarchical data

2020

View full text Add to dashboard Cite

We propose a robust index for semi-structured hierarchical data that supports content-and-structure (CAS) queries specified by path and value predicates. At the heart of our approach is a novel dynamic interleaving scheme that merges the path and value dimensions of composite keys in a balanced way. We store these keys in our trie-based Robust Content-And-Structure index, which efficiently supports a wide range of CAS queries, including queries with wildcards and descendant axes. Additionally, we show important properties of our scheme, such as robustness against varying selectivities, and demonstrate improvements of up to two orders of magnitude over existing approaches in our experimental evaluation.

show abstract

“…RAMCloud [41], FaRM-KV [14], HBase [20], Cassandra [28], LevelDB [23], and RocksDB [16]) all of which show good get/put performance, but have difficulties to process scans with a competitive performance. Another interesting line of related work are document stores, like DocumentDB [42] or MongoDB [32]. Like Cassandra, they offer some scans with secondary indexes, specifically tuned to the document-related use-cases.…”

Section: Related Workmentioning

confidence: 99%

Fast scans on key-value stores

et al. 2017

View full text Add to dashboard Cite

Key-Value Stores (KVS) are becoming increasingly popular because they scale up and down elastically, sustain high throughputs for get/put workloads and have low latencies. KVS owe these advantages to their simplicity. This simplicity, however, comes at a cost: It is expensive to process complex, analytical queries on top of a KVS because today's generation of KVS does not support an efficient way to scan the data. The problem is that there are conflicting goals when designing a KVS for analytical queries and for simple get/put workloads: Analytical queries require high locality and a compact representation of data whereas elastic get/put workloads require sparse indexes. This paper shows that it is possible to have it all, with reasonable compromises. We studied the KVS design space and built TellStore, a distributed KVS, that performs almost as well as state-of-the-art KVS for get/put workloads and orders of magnitude better for analytical and mixed workloads. This paper presents the results of comprehensive experiments with an extended version of the YCSB benchmark and a workload from the telecommunication industry.

show abstract

Schema-agnostic indexing with Azure DocumentDB

Cited by 32 publications

References 8 publications

A Model for the Remote Deployment, Update, and Safe Recovery for Commercial Sensor-Based IoT Systems

A Model for the Remote Deployment, Update, and Safe Recovery for Commercial Sensor-Based IoT Systems

Dynamic interleaving of content and structure for robust indexing of semi-structured hierarchical data

Fast scans on key-value stores

Contact Info

Product

Resources

About