Anomaly Detection with Machine Learning in the Presence of Extreme Value - A Review Paper

Suboh, Syahirah; Aziz, Izzatdin Abdul

doi:10.1109/icbda50157.2020.9289798

Cited by 4 publications

(6 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The data used for the current work is an assimilation of data recorded onboard 7000 TEU 6 post-panamax container ship and weather hindcast data obtained from one of the metocean data repositories. The onboard recorded data samples are obtained as 15-minute averaged values using an onboard installed energy management web application, called Marorka Online.…”

Section: Data Exploration and Processingmentioning

confidence: 99%

“…However, most of these techniques detect outliers by taking into account the distribution of the data in high dimensional variable (or feature) space, paying not much attention towards the correlation between the variables. Such a technique may cause more harm than good as it would result in detecting extreme values (like extreme weather observations) as well as rare event samples as outliers, which would result in poor predictions in extreme or rare conditions using the models calibrated on the cleaned datasets, as concluded by Suboh and Aziz [6]. Moreover, in case of an unbalanced dataset, the data samples present in the sparse regions of high dimensional variable space would, probably, also suffer the same fate as extreme or rare events, resulting in the loss of valuable information.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Correlation-based outlier detection for ships’ in-service datasets

Gupta,

Rasheed,

Steen

2024

J Big Data

View full text Add to dashboard Cite

With the advent of big data, it has become increasingly difficult to obtain high-quality data. Solutions are required to remove undesired outlier samples from massively large datasets. Ship operators rely on high-frequency in-service datasets recorded onboard the ships for monitoring the performance of their fleet. The large in-service datasets are known to be highly unbalanced, making it difficult to adopt ordinary outlier detection techniques, as they would also result in the removal of rare but quite valuable data samples. Thus, the current work proposes to establish a correlation-based outlier detection scheme for ships’ in-service datasets using two well-known dimensionality reduction methods, namely, Principal Component Analysis (PCA) and Autoencoders. The correlation-based approach detects samples which do not fit the prominent correlations present in the dataset and avoids misidentifying the rare but correlation-following samples in the sparse regions of data domain. The study also attempts to provide the physical meaning of the latent variables obtained using PCA. The effectiveness of the proposed methodology is proven using an actual dataset recorded onboard a ship.

show abstract

Section: Data Exploration and Processingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Correlation-based outlier detection for ships’ in-service datasets

Gupta,

Rasheed,

Steen

2024

J Big Data

View full text Add to dashboard Cite

show abstract

“…On the contrary, Statisticians developed various algorithms for anomaly detection, but most of the techniques only apply to univariate cases [27]. The process of determining anomaly is more complicated in multivariate datasets compared to univariate datasets.…”

Section: B Anomaly Detection In High-dimensional and Multivariate Datamentioning

confidence: 99%

A Systematic Review of Anomaly Detection within High Dimensional and Multivariate Data

Suboh

Aziz

Shaharudin

et al. 2023

JOIV : Int. J. Inform. Visualization

View full text Add to dashboard Cite

In data analysis, recognizing unusual patterns (outliers’ analysis or anomaly detection) plays a crucial role in identifying critical events. Because of its widespread use in many applications, it remains an important and extensive research brand in data mining. As a result, numerous techniques for finding anomalies have been developed, and more are still being worked on. Researchers can gain vital knowledge by identifying anomalies, which helps them make better meaningful data analyses. However, anomaly detection is even more challenging when the datasets are high-dimensional and multivariate. In the literature, anomaly detection has received much attention but not as much as anomaly detection, specifically in high dimensional and multivariate conditions. This paper systematically reviews the existing related techniques and presents extensive coverage of challenges and perspectives of anomaly detection within high-dimensional and multivariate data. At the same time, it provides a clear insight into the techniques developed for anomaly detection problems. This paper aims to help select the best technique that suits its rightful purpose. It has been found that PCA, DOBIN, Stray algorithm, and DAE-KNN have a high learning rate compared to Random projection, ROBEM, and OCP methods. Overall, most methods have shown an excellent ability to tackle the curse of dimensionality and multivariate features to perform anomaly detection. Moreover, a comparison of each algorithm for anomaly detection is also provided to produce a better algorithm. Finally, it would be a line of future studies to extend by comparing the methods on other domain-specific datasets and offering a comprehensive anomaly interpretation in describing the truth of anomalies.

show abstract

“…If these data are misjudged as any kind of data, training neural network with valid data will greatly reduce the classification effect of neural network. Therefore, outlier elimination algorithm is used to delete these few points [11]. Concrete method is the first through the principal component analysis for data dimension reduction, the data for dimension reduction after clustering, clustering center here first, respectively defined as the average of the two types of data, clustering is completed, remove from the far point of clustering center and clustering results are inconsistent with the original label, of eliminating outliers is completed.…”

Section: Data Processing Modulementioning

confidence: 99%

Real-time cheating detection system based on sight detection

Zou

Shi

Wang

2022

International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE 2022)

View full text Add to dashboard Cite

Due to the COVID-19 pandemic, many exams, written tests and interviews are conducted online and remotely, which raises a series of questions such as how to prevent cheating. In this project, the methods commonly used in the existing cheating monitoring system are fully investigated and their shortcomings are improved one by one. Finally, a line of sight detection algorithm based on computer vision technology is designed, and a prototype of auxiliary cheating detection system that can get good results only with a small number of samples is developed.

show abstract

Anomaly Detection with Machine Learning in the Presence of Extreme Value - A Review Paper

Cited by 4 publications

References 31 publications

Correlation-based outlier detection for ships’ in-service datasets

Correlation-based outlier detection for ships’ in-service datasets

A Systematic Review of Anomaly Detection within High Dimensional and Multivariate Data

Real-time cheating detection system based on sight detection

Contact Info

Product

Resources

About