A large set of machine learning and pattern classification algorithms trained and tested on KDD intrusion detection data set failed to identify most of the user-toroot and remote-to-local attacks, as reported by many researchers in the literature. In light of this observation, this paper aims to expose the deficiencies and limitations of the KDD data set to argue that this data set should not be used to train pattern recognition or machine learning algorithms for misuse detection for these two attack categories. Multiple analysis techniques are employed to demonstrate, both objectively and subjectively, that the KDD training and testing data subsets represent dissimilar target hypotheses for user-to-root and remote-tolocal attack categories. These techniques consisted of switching the roles of original training and testing data subsets to develop a decision tree classifier, cross-validation on merged training and testing data subsets, and qualitative and comparative analysis of rules generated independently on training and testing data subsets through the C4.5 decision tree algorithm. Analysis results clearly suggest that no pattern classification or machine learning algorithm can be trained successfully with the KDD data set to perform misuse detection for user-to-root or remote-to-local attack categories. It is further noted that the analysis techniques employed to assess the similarity between the two target hypotheses represented by the training and the testing data subsets can readily be generalized to data set pairs in other problem domains.
This article presents a simulation study for validation of an adaptation methodology for learning weights of a Hopfield neural network configured as a static optimizer. The quadratic Liapunov function associated with the Hopfield network dynamics is leveraged to map the set of constraints associated with a static optimization problem. This approach leads to a set of constraint-specific penalty or weighting coefficients whose values need to be defined. The methodology leverages a learning-based approach to define values of constraint weighting coefficients through adaptation. These values are in turn used to compute values of network weights, effectively eliminating the guesswork in defining weight values for a given static optimization problem, which has been a long-standing challenge in artificial neural networks. The simulation study is performed using the Traveling Salesman problem from the domain of combinatorial optimization. Simulation results indicate that the adaptation procedure is able to guide the Hopfield network towards solutions of the problem starting with random values for weights and constraint weighting coefficients. At the conclusion of the adaptation phase, the Hopfield network acquires weight values which readily position the network to search for local minimum solutions. The demonstrated successful application of the adaptation procedure eliminates the need to guess or predetermine the values for weights of the Hopfield network.
This paper presents application of machine learning ensembles, which randomly project the original high dimensional feature space onto multiple lower dimensional feature subspaces, to classification problems with highdimensional feature spaces. The motivation is to address challenges associated with algorithm scalability, data sparsity and information loss due to the so-called curse of dimensionality. The original high dimensional feature space is randomly projected onto a number of lower-dimensional feature subspaces. Each of these subspaces constitutes the domain of a classification subtask, and is associated with a base learner within an ensemble machine-learner context. Such an ensemble conceptualization is called as random subsample ensemble. Simulation results performed on data sets with up to 20,000 features indicate that the random subsample ensemble classifier performs comparably to other benchmark machine learners based on performance measures of prediction accuracy and cpu time. This finding establishes the feasibility of the ensemble and positions it to tackle classification problems with even much higher dimensional feature spaces.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.