In today's world, most of the data (real world) is present in imbalanced form by nature. This is because of not having efficient algorithms to put this data (i.e., generated data by billion of internetconnected devices (IoTs)) in respective format. Imbalanced data poses a great challenge to (both) data mining and machine learning algorithms. The imbalanced dataset consists of a majority class and a minority class, where the majority class takes the lead over the minority class. Generally, several standard learning algorithms assume the balanced class distribution or equal misclassification costs. If prediction is performed by these learning algorithms on imbalanced data, the accuracy will be high for majority classes, i.e., resulting in poor performance. To overcome this problem (or improving accuracy of deision/prediction-making process), data mining and machine learning researchers have addressed the problem of imbalanced data using datalevel, algorithmic level and ensemble or hybrid methods. This article presents a systematic literature review and analyze the results of more than 400 research papers published between 2002-2017 (till June 2017), resulting in a broader and elaborate investigation of the literature in this area of research. Note that extension of this article/work will contain till December 2018 research articles, which will be published in June 2019 (now these more papers/articles did not include due to no. of pages/space issues). The systematic analysis of the research literature has focus on the key role of Data Intrinsic Problems in classification, handling the imbalanced data and the techniques used to overcome the skewed distribution. Furthermore, this article reveals patterns, trends and gaps in the existing literature and discusses briefly the next generation research directions in this area.
In recent years, cyber security has been received interest from several research communities with respect to Intrusion Detection System (IDS). Cyber security is "a fast-growing field demanding a great deal of attention because of remarkable progresses in social networks, cloud and web technologies, online banking, mobile environment, smart grid, etc." An IDS is a software that monitors a single or a network of computers from malicious activities (attacks). Detecting an intrusion or prevention (due to increase the usage of internet), is becoming a critical issue. In past, several techniques have been proposed to overcome or detect intrusion in a network. But most of the techniques (used now days in detecting IDS) are not able to overcome this problem (in efficient manner).Together this, Machine Learning (ML) also has been adopted in various applications (due to providing good accuracy results (in respective domain)). Hence, this work discusses "How machine learning anddata mining can be used to detect IDS in a network" in near future.ML use efficient methods like classification, regression, etc., with efficient results like high detection rates, lower false alarm rates and less communication costs. This work also provides a detail comparison with metrics in table 1-3 (with their performance/ algorithms/ dataset or metrics used).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.