A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic <i>K</i> Neighbor Selection

Wang, Bingming; Song, Ying; Yang, Zhe

doi:10.1155/2020/4365356

Cited by 9 publications

(4 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The logtransformation plays a crucial role here, enabling low concentrations with values of tenths and hundredths of mg/l or µg/l to be separated from each other. This gives the method added value over outlier detection methods that do not use log-transformation (Adikaram et al, 2015;Wang et al, 2020).…”

Section: Discussionmentioning

confidence: 99%

Combining statistical methods for detecting potential outliers in groundwater quality time series

Berendrecht¹,

Vliet²,

Griffioen

2022

Environ Monit Assess

View full text Add to dashboard Cite

Quality control of large-scale monitoring networks requires the use of automatic procedures to detect potential outliers in an unambiguous and reproducible manner. This paper describes a methodology that combines existing statistical methods to accommodate for the specific characteristics of measurement data obtained from groundwater quality monitoring networks: the measurement series show a large variety of dynamics and often comprise few (< 25) measurements, the measurement data are not normally distributed, measurement series may contain several outliers, there may be trends in the series, and/or some measurements may be below detection limits. Furthermore, the detection limits may vary in time. The methodology for outlier detection described in this paper uses robust regression on order statistics (ROS) to deal with measured values below the detection limit. In addition, a biweight location estimator is applied to filter out any temporal trends from the series. The subsequent outlier detection is done in z-score space. Tuning parameters are used to attune the robustness and accuracy to the given dataset and the user requirements. The method has been applied to data from the Dutch national groundwater quality monitoring network, which consists of approximately 350 monitoring wells. It proved to work well in general, detecting outliers at the top and bottom of the regular measurement range and around the detection limit. Given the diversity exhibited by measurement series, it is to be expected that the method does not give 100% satisfactory results. Measured values identified by the method as potential outliers will therefore always need to be further assessed on the basis of expert knowledge, consistency with other measurement data and/or additional research.

show abstract

Section: Discussionmentioning

confidence: 99%

Combining statistical methods for detecting potential outliers in groundwater quality time series

Berendrecht¹,

Vliet²,

Griffioen

2022

Environ Monit Assess

View full text Add to dashboard Cite

show abstract

“…However, its main drawback lies in its dependence on the K parameter, which can significantly affect classification results and its sensitivity to outlier data. So it is necessary to adjust the K parameter to improve optimal accuracy results (Wang et al, 2020). The basic principle of K-nearest Neighbors (KNN) is to find the closest data to the evaluation data based on the K nearest neighbors in the training dataset.…”

Section: Literature Reviewmentioning

confidence: 99%

Valuation K-Nearest Neighbors and Naïve Bayes for Dringking Water Potability Classification

Anisa Rahmawati,

Muhamad Fatchan,

Wahyu Hadikristanto

2024

IJSAS

View full text Add to dashboard Cite

The availability of drinking water that is safe and suitable for consumption is important to support health and development. This research emphasises the importance of handling the clean water crisis through the evaluation of drinking water quality using data mining algorithms. The dringking water quality evaluation method was selected using the K-Nearest Neighbors and Naive Bayes algorithms, replacing the manual method which is less responsive in predicting. The experimental process was conducted by utilising Kaggle website data by applying data processing and oversampling techniques to handle class imbalance in the dataset used. Bases on the research results, the accurancy of the K-Nearest Neighbors Algorithm reaches 65%, which is higher than the accuracy od the Naive Bayes Algorithm which is 64%. So it can be concluded that the K-Nearest Neighbors Algorithm is more effective in predicting the quality of water suitable for consumption. This research provides an in-depth insight into the use of technology and data analysis in dealing with the crisis in the availability of water suitable for consumption and offers suggestions for further research using more diverse methods and the use of more datasets to improve accuracy in evaluating the quality of potable water.

show abstract

“…The kNN classifier is based on a distance function that measures the difference or similarity between two instances (Wang et al, 2020). The standard Euclidean distance "d (x, y)" between two instances "x" and "y" is defined by the formula:…”

Section: Knn Algorithmmentioning

confidence: 99%

“…Therefore, for kNN, there is no actual learning phase. This is why it is generally classified as a lazy learning method (Wang et al, 2020).…”

Section: Knn Algorithmmentioning

confidence: 99%

Using Data Mining Techniques for the Detection of SQL Injection Attacks on Database Systems

Loor

Morocho

Hallo

2023

View full text Add to dashboard Cite

In any business organization, database infrastructures are subject to various structured query language (SQL) injection attacks, such as tautologies, alternative coding, stored procedures, use of the union operator, piggyback, among others. This article describes a data mining project developed to mitigate the problem of identifying SQL injection attacks on databases. The project was conducted using an adaptation of the cross-industry standard process for data mining (CRISP-DM) methodology. A total of 12 python libraries was used for cleaning, transformation, and modeling. The anomaly detection model was carried out using clustering by the k – nearest neighbors (kNN) algorithm. The query text was analyzed for the groups with anomalies to identify sentences presenting attack traces. A web interface was implemented to display the daily summary of the attacks found. The information source was obtained from the transactions log of a PostgreSQL database server. Our results allowed the identification of different attacks by injection of SQL code above 80%. The execution time for processing half a million transaction log was approximately 60 minutes using a computer with the following characteristics: Intel® Core i7 processor 7th generation, 12GB RAM and 500GB SSD.

show abstract

A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

Cited by 9 publications

References 21 publications

Combining statistical methods for detecting potential outliers in groundwater quality time series

Combining statistical methods for detecting potential outliers in groundwater quality time series

Valuation K-Nearest Neighbors and Naïve Bayes for Dringking Water Potability Classification

Using Data Mining Techniques for the Detection of SQL Injection Attacks on Database Systems

Contact Info

Product

Resources

About