Ugo Vespier scite author profile

et al. 2014

Fitting sensors to humans and physical structures is becoming more and more common. These developments provide many opportunities for ubiquitous computing, as well as challenges for analyzing the resulting sensor data. From these challenges, an underappreciated problem arises: modeling multivariate time series with mixed sampling rates. Although mentioned in several application papers using sensor systems, this problem has been left almost unexplored, often hidden in a preprocessing step or solved manually as a one-pass procedure (feature extraction/construction). This leaves an opportunity to formalize and develop methods that address mixed sampling rates in an automatic fashion.We approach the problem of dealing with multiple sampling rates from an aggregation perspective. We propose Accordion, a new embedded method that constructs and selects aggregate features iteratively, in a memory-conscious fashion. Our algorithms work on both classification and regression problems. We describe three experiments on real-world time series datasets, with satisfying results.

MDL-Based Analysis of Time Series at Multiple Time-Scales

Knobbe

Nijssen

et al. 2012

Abstract. The behavior of many complex physical systems is affected by a variety of phenomena occurring at different temporal scales. Time series data produced by measuring properties of such systems often mirrors this fact by appearing as a composition of signals across different time scales. When the final goal of the analysis is to model the individual phenomena affecting a system, it is crucial to be able to recognize the right temporal scales and to separate the individual components of the data. In this paper, we approach this challenge through a combination of the Minimum Description Length (MDL) principle, feature selection strategies, and convolution techniques from the signal processing field. As a result, our algorithm produces a good decomposition of a given time series and, as a side effect, builds a compact representation of its identified components. Experiments demonstrate that our method manages to identify correctly both the number and the temporal scale of the components for real-world as well as artificial data and show the usefulness of our method as an exploratory tool for analyzing time series data.

Mining characteristic multi-scale motifs in sensor-based time series

Nijssen

Knobbe

2013

More and more, physical systems are being fitted with various kinds of sensors in order to monitor their behavior, health or intensity of use. The large quantities of time series data collected from these complex systems often exhibit two important characteristics: the data is a combination of various superimposed effects operating at different time scales, and each effect shows a fair degree of repetition. Each of these effects can be described by a small collection of motifs: recurring temporal patterns in the data. We propose a method to discover characteristic and potentially overlapping motifs at multiple time scales, taking into account systemic deformations and temporal warping. Our method is based on a combination of scale-space theory and the Minimum Description Length principle. We show its effectiveness on two time series datasets from real world applications.

Predefined pattern detection in large time series

Miao

Cachucho

et al. 2016

Information Sciences

a b s t r a c tPredefined pattern detection from time series is an interesting and challenging task. In order to reduce its computational cost and increase effectiveness, a number of time series representation methods and similarity measures have been proposed. Most of the existing methods focus on full sequence matching, that is, sequences with clearly defined beginnings and endings, where all data points contribute to the match. These methods, however, do not account for temporal and magnitude deformations in the data and result to be ineffective on several real-world scenarios where noise and external phenomena introduce diversity in the class of patterns to be matched. In this paper, we present a novel pattern detection method, which is based on the notions of templates, landmarks, constraints and trust regions. We employ the Minimum Description Length (MDL) principle for time series preprocessing step, which helps to preserve all the prominent features and prevents the template from overfitting. Templates are provided by common users or domain experts, and represent interesting patterns we want to detect from time series. Instead of utilising templates to match all the potential subsequences in the time series, we translate the time series and templates into landmark sequences, and detect patterns from landmark sequence of the time series. Through defining constraints within the template landmark sequence, we effectively extract all the landmark subsequences from the time series landmark sequence, and obtain a number of landmark segments (time series subsequences or instances). We model each landmark segment through scaling the template in both temporal and magnitude dimensions. To suppress the influence of noise, we introduce the concept of trust region, which not only helps to achieve an improved instance model, but also helps to catch the accurate boundaries of instances of the given template. Based on the similarities derived from instance models, we introduce the probability density function to calculate a similarity threshold. The threshold can be used to judge if a landmark segment is a true instance of the given template or not. To evaluate the effectiveness and efficiency of the proposed method, we apply it to two real-world datasets. The results show that our method is capable of detecting patterns of temporal and magnitude deformations with competitive performance.

Traffic Events Modeling for Structural Health Monitoring

Knobbe

Vanschoren

et al. 2011

Abstract. Since 2008, a sensor network on a major Dutch highway bridge has been monitoring the structural health of the bridge, by measuring various parameters at different locations along the infrastructure. These parameters include strain, vibration and climate. The aim of the InfraWatch project is to model the health and behavior of the bridge by analyzing the large quantities of data that the sensors produce. One of the many forms of modeling involved is the identification of traffic events (cars, trucks, congestion and so on), as knowing when they occur, and of what nature they are, will enable modeling the response of the bridge to each of these events. In this paper, we approach the problem as a time series subsequence clustering problem. As it is known that such a clustering method can be problematic on certain types of time series, we verified known problems on the InfraWatch data. Indeed some of the undesired phenomena occurred in our case, but to a lesser extent than previously suggested. We introduce a new distance measure over subsequences that discourages the observed behavior and allows us to identify traffic events reliably, even on large quantities of data.

Large-Scale Sensor Network Analysis

Vanschoren¹,

Miao

et al.

Sensors are increasingly being used to monitor the world around us. They measure movements of structures such as bridges, windmills, and plane wings, human’s vital signs, atmospheric conditions, and fluctuations in power and water networks. In many cases, this results in large networks with different types of sensors, generating impressive amounts of data. As the volume and complexity of data increases, their effective use becomes more challenging, and novel solutions are needed both on a technical as well as a scientific level. Founded on several real-world applications, this chapter discusses the challenges involved in large-scale sensor data analysis and describes practical solutions to address them. Due to the sheer size of the data and the large amount of computation involved, these are clearly “Big Data” applications.