Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on LOF algorithms. It collects and categorizes existing local outlier detection algorithms and analyzes their characteristics. Furthermore, the paper discusses the advantages and limitations of those algorithms and proposes several promising directions for developing improved local outlier detection methods for data streams.
Living organisms including fishes, microbes, and animals can live in extremely cold weather. To stay alive in cold environments, these species generate antifreeze proteins (AFPs), also referred to as ice-binding proteins. Moreover, AFPs are extensively utilized in many important fields including medical, agricultural, industrial, and biotechnological. Several predictors were constructed to identify AFPs. However, due to the sequence and structural heterogeneity of AFPs, correct identification is still a challenging task. It is highly desirable to develop a more promising predictor. In this research, a novel computational method, named AFP-LXGB has been proposed for prediction of AFPs more precisely. The information is explored by Dipeptide Composition (DPC), Grouped Amino Acid Composition (GAAC), Position Specific Scoring Matrix-Segmentation-Autocorrelation Transformation (Sg-PSSM-ACT), and Pseudo Position Specific Scoring Matrix Tri-Slicing (PseTS-PSSM). Keeping the benefits of ensemble learning, these feature sets are concatenated into different combinations. The best feature set is selected by Extremely Randomized Tree-Recursive Feature Elimination (ERT-RFE). The models are trained by Light eXtreme Gradient Boosting (LXGB), Random Forest (RF), and Extremely Randomized Tree (ERT). Among classifiers, LXGB has obtained the best prediction results. The novel method (AFP-LXGB) improved the accuracies by 3.70% and 4.09% than the best methods. These results verified that AFP-LXGB can predict AFPs more accurately and can participate in a significant role in medical, agricultural, industrial, and biotechnological fields.
Electricity theft harms smart grids and results in huge revenue losses for electric companies. Deep learning (DL), machine learning (ML), and statistical methods have been used in recent research studies to detect anomalies and illegal patterns in electricity consumption (EC) data collected by smart meters. In this paper, we propose a hybrid DL model for detecting theft activity in EC data. The model combines both a gated recurrent unit (GRU) and a convolutional neural network (CNN). The model distinguishes between legitimate and malicious EC patterns. GRU layers are used to extract temporal patterns, while the CNN is used to retrieve optimal abstract or latent patterns from EC data. Moreover, imbalance of data classes negatively affects the consistency of ML and DL. In this paper, an adaptive synthetic (ADASYN) method and TomekLinks are used to deal with the imbalance of data classes. In addition, the performance of the hybrid model is evaluated using a real-time EC dataset from the State Grid Corporation of China (SGCC). The proposed algorithm is computationally expensive, but on the other hand, it provides higher accuracy than the other algorithms used for comparison. With more and more computational resources available nowadays, researchers are focusing on algorithms that provide better efficiency in the face of widespread data. Various performance metrics such as F1-score, precision, recall, accuracy, and false positive rate are used to investigate the effectiveness of the hybrid DL model. The proposed model outperforms its counterparts with 0.985 Precision–Recall Area Under Curve (PR-AUC) and 0.987 Receiver Operating Characteristic Area Under Curve (ROC-AUC) for the data of EC.
Owing to the development and expansion of energy-aware sensing devices and autonomous and intelligent systems, the Internet of Things (IoT) has gained remarkable growth and found uses in several day-to-day applications. However, IoT devices are highly prone to botnet attacks. To mitigate this threat, a lightweight and anomaly-based detection mechanism that can create profiles for malicious and normal actions on IoT networks could be developed. Additionally, the massive volume of data generated by IoT gadgets could be analyzed by machine learning (ML) methods. Recently, several deep learning (DL)-related mechanisms have been modeled to detect attacks on the IoT. This article designs a botnet detection model using the barnacles mating optimizer with machine learning (BND-BMOML) for the IoT environment. The presented BND-BMOML model focuses on the identification and recognition of botnets in the IoT environment. To accomplish this, the BND-BMOML model initially follows a data standardization approach. In the presented BND-BMOML model, the BMO algorithm is employed to select a useful set of features. For botnet detection, the BND-BMOML model in this study employs an Elman neural network (ENN) model. Finally, the presented BND-BMOML model uses a chicken swarm optimization (CSO) algorithm for the parameter tuning process, demonstrating the novelty of the work. The BND-BMOML method was experimentally validated using a benchmark dataset and the outcomes indicated significant improvements in performance over existing methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.