Learning Ensembles of Anomaly Detectors on Synthetic Data

Smolyakov, Dmitry; Sviridenko, Nadezda; Ishimtsev, Vladislav; Burikov, Evgeny; Burnaev, Evgeny

doi:10.1007/978-3-030-22808-8_30

Cited by 8 publications

(7 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…earthquakes with big magnitude are rare events, a kind of anomalies. Thus we can first detect sequences of anomalies of different types in the historical stream of earthquake data [3,26,9,19,32], and then we can construct ensembles for rare events prediction [2,29] using detected anomalies and their features as precursors of major earthquakes to optimize specific detection metrics similar to the one used in [7], use privileged information about the future events, which is accessible during the training stage. Analogous approach, used in [8,28] for anomaly detection, allowed significant accuracy improvement, historical data on earthquakes has a spatial component, thus a graph of dependency between streams of events, registered by different ground stations can be constructed and modern methods for graph feature learning [20] and panel time-series feature extraction [24,23] ROC AUC score measures the quality of binary classifier.…”

Section: Discussionmentioning

confidence: 99%

Usage of Multiple RTL Features for Earthquakes Prediction

Proskura

Zaytsev

Braslavsky

et al. 2019

Computational Science and Its Applications – ICCSA 2019

View full text Add to dashboard Cite

We construct a classification model that predicts if an earthquake with the magnitude above a threshold will take place at a given location in a time range 30-180 days from a given moment of time. A common approach is to use expert forecasts based on features like Region-Time-Length (RTL) characteristics. The proposed approach uses machine learning on top of multiple RTL features to take into account effects at various scales and to improve prediction accuracy. For historical data about Japan earthquakes 1992-2005 and predictions at locations given in this database the best model has precision up to ∼ 0.95 and recall up to ∼ 0.98.

show abstract

Section: Discussionmentioning

confidence: 99%

Usage of Multiple RTL Features for Earthquakes Prediction

Proskura

Zaytsev

Braslavsky

et al. 2019

Computational Science and Its Applications – ICCSA 2019

View full text Add to dashboard Cite

show abstract

“…Almost any feature based machine learning method may be applied to anomaly detection problems, and approaches described in the literature include principal components analysis, support vector machines (Tran et al, 2019), HDOutliers (Leigh et al, 2018), k-nearest neighbor (Russo et al, 2020;Talagala et al, 2019), clustering (Hill and Minsker, 2010), random forest (Russo et al, 2020), xgboost, and isolated forest (Smolyakov et al, 2019). The success of feature based techniques in detecting anomalies from environmental sensor data is mixed (Hill and Minsker, 2010;Leigh et al, 2018;Russo et al, 2020).…”

Section: A5 Feature Based Approachesmentioning

confidence: 99%

Toward automating post processing of aquatic sensor data

Jones¹,

Jones²,

Horsburgh³

2022

Preprint

View full text Add to dashboard Cite

Sensors measuring environmental phenomena at high frequency commonly report anomalies related to fouling, sensor drift and calibration, and datalogging and transmission issues. Suitability of data for analyses and decision making often depends on manual review and adjustment of data. Machine learning techniques have potential to automate identification and correction of anomalies, streamlining the quality control process. We explored approaches for automating anomaly detection and correction of aquatic sensor data for implementation in a Python package (PyHydroQC). We applied both classical and deep learning time series regression models that estimate values, identify anomalies based on dynamic thresholds, and offer correction estimates. Techniques were developed and performance assessed using data reviewed, corrected, and labeled by technicians in an aquatic monitoring use case. Auto-Regressive Integrated Moving Average (ARIMA) consistently performed best, and aggregating results from multiple models improved detection. PyHydroQC includes custom functions and a workflow for anomaly detection and correction.

show abstract

“…In recent years, the popularity and achieved results of the ensemble approach in the outlier detection problem have grown as well. The current state of the ensemble analysis and various ensemble procedures for the outlier detection problem are represented in the following papers [12][13][14][15][16][17]. Although outlier detection and changepoint detection problems are often considered subproblems of general anomaly detection problem, the ensemble approach in the changepoint detection problem is weakly formalized and much less highlighted.…”

Section: Introductionmentioning

confidence: 99%

“…Model-centered: these are the models that we use to create an ensemble, but we do not pick subsets of data points or data features (data-centered). A variety of scaling and aggregation functions for outlier, changepoint, classification ensembles, as well as the related issues can be found in papers [12][13][14]16,[18][19][20]27,28]. Though scaling can be included in and considered part of aggregation procedure [4], we treat it separately from the aggregation function.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised Offline Changepoint Detection Ensembles

et al. 2021

View full text Add to dashboard Cite

Offline changepoint detection (CPD) algorithms are used for signal segmentation in an optimal way. Generally, these algorithms are based on the assumption that signal’s changed statistical properties are known, and the appropriate models (metrics, cost functions) for changepoint detection are used. Otherwise, the process of proper model selection can become laborious and time-consuming with uncertain results. Although an ensemble approach is well known for increasing the robustness of the individual algorithms and dealing with mentioned challenges, it is weakly formalized and much less highlighted for CPD problems than for outlier detection or classification problems. This paper proposes an unsupervised CPD ensemble (CPDE) procedure with the pseudocode of the particular proposed ensemble algorithms and the link to their Python realization. The approach’s novelty is in aggregating several cost functions before the changepoint search procedure running during the offline analysis. The numerical experiment showed that the proposed CPDE outperforms non-ensemble CPD procedures. Additionally, we focused on analyzing common CPD algorithms, scaling, and aggregation functions, comparing them during the numerical experiment. The results were obtained on the two anomaly benchmarks that contain industrial faults and failures—Tennessee Eastman Process (TEP) and Skoltech Anomaly Benchmark (SKAB). One of the possible applications of our research is the estimation of the failure time for fault identification and isolation problems of the technical diagnostics.

show abstract

Learning Ensembles of Anomaly Detectors on Synthetic Data

Cited by 8 publications

References 25 publications

Usage of Multiple RTL Features for Earthquakes Prediction

Usage of Multiple RTL Features for Earthquakes Prediction

Toward automating post processing of aquatic sensor data

Unsupervised Offline Changepoint Detection Ensembles

Contact Info

Product

Resources

About