Beyond one billion time series: indexing and mining very large time series collections with $$i$$ SAX2+

Camerra, Alessandro; Shieh, Jin; Palpanas, Themis; Rakthanmanon, Thanawin; Keogh, Eamonn

doi:10.1007/s10115-012-0606-6

Cited by 89 publications

(152 citation statements)

References 25 publications

Supporting

Mentioning

152

Contrasting

Order By: Relevance

“…Most methods are based on longest common sub-sequence algorithm [17], [18]. However, these methods are not ideal for IoT/M2M data for two main reasons.…”

Section: Background and Related Workmentioning

confidence: 99%

Improve IoT/M2M Data Organization Based on Stream Patterns

Antunes

Jesus

Gomes

et al. 2017

2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)

View full text Add to dashboard Cite

Abstract-The increasing number of small, cheap devices full of sensing capabilities lead to an untapped source of information that can be explored to improve and optimize several systems. Yet, as this number grows it becomes increasingly difficult to manage and organize all this new information. The lack of a standard context representation scheme is one of the main difficulties in this research area. With this in mind we propose a tailored generative stream model, with two main uses: stream similarity and generation. Sensor data can be organized based on pattern similarity, that can be estimated using the proposed model. The proposed stream model will be used in conjunction with our context organization model, in which we aim to provide an automatic organizational model without enforcing specific representations. Moreover, the model can be used to generate streams in a controlled environment. Useful for validating, evaluating and testing any platform that deals with IoT/M2M devices.

show abstract

“…Most methods are based on longest common sub-sequence algorithm [17], [18]. However, these methods are not ideal for IoT/M2M data for two main reasons.…”

Section: Background and Related Workmentioning

confidence: 99%

Improve IoT/M2M Data Organization Based on Stream Patterns

Antunes

Jesus

Gomes

et al. 2017

2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)

View full text Add to dashboard Cite

show abstract

“…For instance, iSAX requires more than 6 days to index 100 million (10 8 ) time-series data [Camerra et al 2010]. However, [Camerra et al 2014] argue that it requires two days to build the same data size and 20 days to build 500 million (5 × 10 8 ) time-series data. iSAX requires a long time to build indexes because two main reasons: a) Ine cient splitting policy b) No bulk loading scheme.…”

Section: Symbolic Data Indexing Approachmentioning

confidence: 99%

“…However, similar to SAX, determining iSAX parameters relies heavily on the data. Moreover, once the root's and the child nodes' representations are constructed, it is not possible to update them [Camerra et al 2014], which is a constraint; especially if we consider using iSAX in indexing on-line time-series data which requires the indexing mechanism to be continuously updated with no prior knowledge of the data size. Intuitively, iSAX does not allow the child nodes to be represented by a higher cardinality once they are created.…”

Section: Thematic Datamentioning

confidence: 99%

Large-Scale Indexing, Discovery, and Ranking for the Internet of Things (IoT)

2018

View full text Add to dashboard Cite

Network-enabled sensing and actuation devices are key enablers to connect real-world objects to the cyber world. The Internet of Things (IoT) consists of the network-enabled devices and communication technologies that allow connectivity and integration of physical objects (Things) into the digital world (Internet). Enormous amounts of dynamic IoT data are collected from Internet-connected devices. IoT data is usually multi-variant streams that are heterogeneous, sporadic, multi-modal and spatio-temporal. IoT data can be disseminated with di erent granularities and have diverse structures, types and qualities. Dealing with the data deluge from heterogeneous IoT resources and services imposes new challenges on indexing, discovery and ranking mechanisms that will allow building applications that require on-line access and retrieval of ad-hoc IoT data. However, the existing IoT data indexing and discovery approaches are complex or centralised which hinders their scalability. The primary objective of this paper is to provide a holistic overview of the state-of-the-art on indexing, discovery and ranking of IoT data. The paper aims to pave the way for researchers to design, develop, implement and evaluate techniques and approaches for on-line large-scale distributed IoT applications and services.

show abstract

“…At the same time we have witnessed an increased interest in data series management and processing [32,23,22,9], related to data produced by sensors, or scientific experiments.…”

Section: Related Workmentioning

confidence: 99%

“…Let D be a dataset with N = 4 and m = 3. Let S1, S2, S3 and S4 be the instantiated distance partitions: S1 = {[2, 2] : 0.33, [4,4] : 0.33, [6,6] : 0.33}, S2 = { [4,8] : 1}, S3 = {[1, 1] : 0.33, [5,5] : 0.33, [9,9] : 0.33} and S4 = {[1, 1] : 0.33, [3,3] : 0.33, [7,7] : 0.33}. The PNN probability estimates determined using the Eq.4 and Eq.…”

Section: Lemma 2 (Dependencies In Distance Partitions)mentioning

confidence: 99%

Top-k nearest neighbor search in uncertain data series

2014

Self Cite

View full text Add to dashboard Cite

Many real applications consume data that is intrinsically uncertain, noisy and error-prone. In this study, we investigate the problem of finding the top-k nearest neighbors in uncertain data series, which occur in several different domains. We formalize the top-k nearest neighbor problem for uncertain data series, and describe a model for uncertain data series that captures both uncertainty and correlation. This distinguishes our approach from prior work that compromises the accuracy of the model by assuming independence of the value distribution at neighboring time-stamps. We introduce the Holistic-PkNN algorithm, which uses novel metric bounds for uncertain series and an efficient refinement strategy to reduce the overall number of required probability estimates. We evaluate our proposal under a variety of settings using a combination of synthetic and 45 real datasets from diverse domains. The results demonstrate the significant advantages of the proposed approach.

show abstract

Beyond one billion time series: indexing and mining very large time series collections with $$i$$ SAX2+

Cited by 89 publications

References 25 publications

Improve IoT/M2M Data Organization Based on Stream Patterns

Improve IoT/M2M Data Organization Based on Stream Patterns

Large-Scale Indexing, Discovery, and Ranking for the Internet of Things (IoT)

Top-k nearest neighbor search in uncertain data series

Contact Info

Product

Resources

About