-Target classification in hyperspectral imagery has been demonstrated to be very useful in remote-sensing applications. While spectral bands provide information for classification, they give rise to a large number of features. However, a large number of features often degrade performance. In such situations, dimensionality reduction can be very helpful. There are many such techniques in the literature, and the most popular one is Fisher's linear discriminant analysis (LDA). For two class problems, LDA can be shown to be optimal. For the multi-class case, LDA is not. As such, a multi-class problem is cast into a binary one. This formulation not only simplifies the problem but also works well in practice. However, it lacks theoretical justification. We show in this paper the connection between the above formulation and Relief feature selection, thereby providing a sound basis for observed benefits associated with this formulation. Furthermore, we propose a margin based algorithm for dimensionality reduction that addresses some of the problems facing the two class formulation. We provide experimental results that corroborate well with our analysis.Index Terms -Classification, dimensionality reduction, Relief
I . IntroductionTarget classification in hyperspectral imagery has been demonstrated to be very challenging, and at the same time to be extremely useful in many remote-sensing applications [1], [2], [3]. While spectral-reflectance measurements provide information for target detection and classification, they generate a large number of features, resulting in a high dimensional measurement space [4]. However, a large number of features often degrade classification performance. This fact is due to the curse of dimensionality. In such situations, feature extraction or selection methods play an important role by significantly reducing the number of features for building classifiers.There are many dimensionality reduction techniques for classification in the literature. The most popular one is Fisher's linear discriminant analysis (LDA) [5]. In LDA, we are given a set of N examples z = {(, where x i ∈ ℜ q are the qdimensional inputs, and y i are scalar labels. Consider a C class problem, where m is the mean vector of all data, and m i is the mean vector of ith class data. A within-class matrix characterizes the scatter of samples around their respective class mean vectors, and it is expressed by , where is the number of examples in the ith class, p i (∑ i p i = 1) represents the pro-portion of class i, and t denotes matrix transpose. A between-class scatter matrix characterizes the scatter of the class means around the overall mean m:t . Thus, LDA finds the projection matrix that maximizes the objectiveWe can obtain W that maximizes J(W) by solving the generalized eigenvalue problem: S b w i = λ i S w w i .From the Bayes perspective, LDA is optimal for two Gaussians with equal covariances [6], [7]. However, LDA is not optimal for multiple Gaussian distributions or classes with unequal covariance matrices. To...