Using Imbalanced Triangle Synthetic Data for Machine Learning Anomaly Detection

Luo, Menghua; Wang, Ke; Cai, Zhiping; Liu, Anfeng; Li, Yangyang; Cheang, Chak Fong

doi:10.32604/cmc.2019.03708

Cited by 52 publications

(22 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on several studies, we found that a commonly used dataset for health data mining was the Pima Indians Diabetes Dataset from the University of California, Irvine (UCI) Machine Learning Database [24][25][26][27][28][29]. The datasets consist of several medical predictor (independent) variables and one target (dependent) variable, Outcome.…”

Section: Methodsmentioning

confidence: 99%

“…In order to check the performance of the upgraded network has been processedt the experimental dataset of [23,24], representing a good dataset for testing LSTM neural network. The experimental dataset [24] has been adopted in the literature for different data mining testing [24][25][26][27][28][29]. Specifically in reference [25], the K-means algorithm has been applied for predicting diabetes, in reference [26] some authors applied synthetic data in order to balance a machine learning dataset model, while references [27][28][29] have analyzed different machine learning algorithms for diabetes prediction.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction

Massaro¹,

Maritati²,

Giannone³

et al. 2019

Applied Sciences

View full text Add to dashboard Cite

The paper is focused on the application of Long Short-Term Memory (LSTM) neural network enabling patient health status prediction focusing the attention on diabetes. The proposed topic is an upgrade of a Multi-Layer Perceptron (MLP) algorithm that can be fully embedded into an Enterprise Resource Planning (ERP) platform. The LSTM approach is applied for multi-attribute data processing and it is integrated into an information system based on patient management. To validate the proposed model, we have adopted a typical dataset used in the literature for data mining model testing. The study is focused on the procedure to follow for a correct LSTM data analysis by using artificial records (LSTM-AR-), improving the training dataset stability and test accuracy if compared with traditional MLP and LSTM approaches. The increase of the artificial data is important for all cases where only a few data of the training dataset are available, as for more practical cases. The paper represents a practical application about the LSTM approach into the decision support systems (DSSs) suitable for homecare assistance and for de-hospitalization processes. The paper goal is mainly to provide guidelines for the application of LSTM neural network in type I and II diabetes prediction adopting automatic procedures. A percentage improvement of test set accuracy of 6.5% has been observed by applying the LSTM-AR- approach, comparing results with up-to-date MLP works. The LSTM-AR- neural network can be applied as an alternative approach for all homecare platforms where not enough training sequential dataset is available.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction

Massaro¹,

Maritati²,

Giannone³

et al. 2019

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Thus batch learning consumes lots of time and space resources, resulting in low efficiency. Besides, in many real-world situations, such as anomaly detection [1] and stock forecasting [2], the data is growing rapidly and evolving. Sometimes the model needs to be trained without waiting for all the data collected.…”

Section: Introductionmentioning

confidence: 99%

Incremental Cost-Sensitive Support Vector Machine With Linear-Exponential Loss

Zhao

Wang

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Incremental learning or online learning as a branch of machine learning has attracted more attention recently. For large-scale problems and dynamic data problem, incremental learning overwhelms batch learning, because of its efficient treatment for new data. However, class imbalance problem, which always appears in online classification brings a considerable challenge for incremental learning. The serious class imbalance problem may directly lead to a useless learning system. Cost-sensitive learning is an important learning paradigm for class imbalance problems and widely used in many applications. In this paper, we propose an incremental cost-sensitive learning method to tackle the class imbalance problems in the online situation. This proposed algorithm is based on a novel cost-sensitive support vector machine, which uses the Linear-exponential (LINEX) loss to implement high cost for minority class and low cost for majority class. Using the half-quadratic optimization, we first put forward the algorithm for the costsensitive support vector machine, called CSLINEX-SVM*. Then we propose the incremental cost-sensitive algorithm, ICSL-SVM. The results of numeric experiments demonstrate that the proposed incremental algorithm outperforms some conventional batch algorithms except the proposed CSLINEX-SVM*.

show abstract

“…This has made the application range of wireless sensor networks more widely expanded, including smart home, intelligent agriculture, and other fields [21][22][23][24][25]. With the development, sensing device has also developed rapidly; the current Internet is experiencing a trend from centralization to marginalization, Cloud computing [26,27], Edge computing [28,29], and Fog computer [30][31][32] which correspond to the new computational model proposed for such development [26,28,[33][34][35]. With the rapid rise of artificial intelligence technology [36,37], the combination of artificial intelligence and Internet of Things (IoT) has made it a longer development [38][39][40], which has become the focus of researchers.…”

Section: Introductionmentioning

confidence: 99%

Delay and energy-efficient data collection scheme-based matrix filling theory for dynamic traffic IoT

Xiang

Liu

Wang

et al. 2019

J Wireless Com Network

Self Cite

View full text Add to dashboard Cite

Data collection is the basic functions of the Internet of Things (IoT), in which the sensed data are concentrations from sensor nodes to the sink, with a timely style, so the smart response can be done for emergency. The goal of multi-modal sensor data fusion is to obtain simple and accurate data to enhance system reliability and fault tolerance. Energy efficiency and small delay are the most important indicators which govern the performance of IoT. Convergecast is a low-latency data collection strategy based on effective time division multiple access (TDMA), in which each sensor node generates a packet, and m packets can aggregate to a packet. However, in most practical networks, sensor nodes do not necessarily generate packets during each data collection cycle, but instead generate packets from time to time. In the previous convergecast strategy, each node was fixedly allocated a slot, which increased the delay and wasted energy. A delay and energy-efficient data collection (DEEDC) scheme-based matrix filling theory is proposed to collect data in a randomly generated WSNs with minimum delay and energy consumption. The DEEDC scheme uses a clustering approach. For each cluster, the number of slots required for transmission is calculated by matrix filling theory, not the number of nodes that actually generate data. This ensures that data can be collected in a network with randomly generated data (number of slots ≤ number of nodes), thereby avoiding the allocation of slots for each node and the acquisition of redundant data to lead to the wastage of time and energy. Based on the above, a mixed slot scheduling strategy is proposed to construct energy and delay-efficient, collision-free schedule scheme. After extensive theoretical analysis, by using the DEEDC scheme, the delay is reduced by about 50~80%, and the energy consumed is reduced by about 40~57%.

show abstract

Using Imbalanced Triangle Synthetic Data for Machine Learning Anomaly Detection

Cited by 52 publications

References 13 publications

LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction

LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction

Incremental Cost-Sensitive Support Vector Machine With Linear-Exponential Loss

Delay and energy-efficient data collection scheme-based matrix filling theory for dynamic traffic IoT

Contact Info

Product

Resources

About