Dynamic Time Warping (DTW) is certainly the most relevant distance for time series analysis. However, its quadratic time complexity may hamper its use, mainly in the analysis of large time series data. All the recent advances in speeding up the exact DTW calculation are confined to similarity search. However, there is a significant number of important algorithms including clustering and classification that require the pairwise distance matrix for all time series objects. The only techniques available to deal with this issue are constraint bands and DTW approximations. In this paper, we propose the first exact approach for speeding up the all-pairwise DTW matrix calculation. Our method is exact and may be applied in conjunction with constraint bands. We demonstrate that our algorithm reduces the runtime in approximately 50% on average and up to one order of magnitude in some datasets.
Insects have a close relationship with the humanity, in both positive and negative ways. Mosquito borne diseases kill millions of people and insect pests consume and destroy around US $40 billion worth of food each year. In contrast, insects pollinate at least two-thirds of all the food consumed in the world. In order to control populations of disease vectors and agricultural pests, researchers in entomology have developed numerous methods including chemical, biological and mechanical approaches. However, without the knowledge of the exact location of the insects, the use of these techniques becomes costly and inefficient. We are developing a novel sensor as a tool to control disease vectors and agricultural pests. This sensor, which is built from inexpensive commodity electronics, captures insect flight information using laser light and classifies the insects according to their species. The use of machine learning techniques allows the sensor to automatically identify the species without human intervention. Finally, the sensor can provide real-time estimates of insect species with virtually no time gap between the insect identification and the delivery of population estimates. In this paper, we present our solution to the most important challenge to make this sensor practical: the creation of an accurate classification system. We show that, with the correct combination of feature extraction and machine learning techniques, we can achieve an accuracy of almost 90 % in the task of identifying the correct insect species among nine species. Specifically, we show that we can achieve an accuracy of 95 % in the task of correctly recognizing if a given event was generated by a disease vector mosquito.
Abstract-There is a huge increase of interest for time series methods and techniques. Virtually every piece of information collected from human, natural, and biological processes is susceptible to changes over time, and the study of how these changes occur is a central issue in fully understanding such processes. Among all time series mining tasks, classification is likely to be the most prominent one. In time series classification there is a significant body of empirical research that indicates that k-nearest neighbor rule in the time domain is very effective. However, certain time series features are not easily identified in this domain and a change in representation may reveal some significant and unknown features. In this work, we propose the use of recurrence plots as representation domain for time series classification. Our approach measures the similarity between recurrence plots using Campana-Keogh (CK-1) distance, a Kolmogorov complexitybased distance that uses video compression algorithms to estimate image similarity. We show that recurrence plots allied to CK-1 distance lead to significant improvements in accuracy rates compared to Euclidean distance and Dynamic Time Warping in several data sets. Although recurrence plots cannot provide the best accuracy rates for all data sets, we demonstrate that we can predict ahead of time that our method will outperform the time representation with Euclidean and Dynamic Time Warping distances.
Abstract-Time series are present in many pattern recognition applications related to medicine, biology, astronomy, economy, and others. In particular, the classification task has attracted much attention from a large number of researchers. In such a task, empirical researches has shown that the 1-Nearest Neighbor rule with a distance measure in time domain usually performs well in a variety of application domains. However, certain time series features are not evident in time domain. A classical example is the classification of sound, in which representative features are usually present in the frequency domain. For these applications, an alternative representation is necessary. In this work we investigate the use of recurrence plots as data representation for time series classification. This representation has well-defined visual texture patterns and their graphical nature exposes hidden patterns and structural changes in data. Therefore, we propose a method capable of extracting texture features from this graphical representation, and use those features to classify time series data. We use traditional methods such as Grey Level Co-occurrence Matrix and Local Binary Patterns, which have shown good results in texture classification. In a comprehensible experimental evaluation, we show that our method outperforms the state-ofthe-art methods for time series classification.
Abstract-The majority of evolving data streams classification algorithms assume that the actual labels of the predicted examples are readily available without any time delay just after a prediction is made. However, given the high label costs, dependence of an expert, limitations in data transmission or even restrictions imposed by the problem's nature, there is a large number of real-world applications in which the availability of actual labels is infinitely delayed (never available). In these cases, it is necessary the use of algorithms that does not follow the traditional process of monitoring the error rate to detect changes in data distribution and uses the most recent labeled data to update the classification model. In this paper, we propose the method MClassification to classify evolving data streams with infinitely delayed labels. Our method is inspired on the use of Micro-Cluster representation from online clustering algorithms. Considering the presence of incremental drifts, our approach uses a distance-based strategy to maintain the Micro-Clusters' positions updated. An evaluation in several synthetic and real data shows that MClassification achieves competitive accuracy results to state-of-the-art methods and adequate computational cost. The main advantage of the proposed method is the absence of critical parameters that require user's prior knowledge, as occurs with rival methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.