We propose a genetic-based expectation-maximization (GA-EM) algorithm for learning Gaussian mixture models from multivariate data. This algorithm is capable of selecting the number of components of the model using the minimum description length (MDL) criterion. Our approach benefits from the properties of Genetic algorithms (GA) and the EM algorithm by combination of both into a single procedure. The population-based stochastic search of the GA explores the search space more thoroughly than the EM method. Therefore, our algorithm enables escaping from local optimal solutions since the algorithm becomes less sensitive to its initialization. The GA-EM algorithm is elitist which maintains the monotonic convergence property of the EM algorithm. The experiments on simulated and real data show that the GA-EM outperforms the EM method since: 1) We have obtained a better MDL score while using exactly the same termination condition for both algorithms. 2) Our approach identifies the number of components which were used to generate the underlying data more often than the EM algorithm.
Standard hidden Markov models (HMM's) have been studied extensively in the last two decades. It is well known that these models assume state conditional independence of the observations. Therefore, they are inadequate for classification of complex and highly structured patterns. Nowadays, the need for new statistical models that are capable to cope with structural time series data is increasing. We propose in this paper a novel paradigm that we named "structural hidden Markov model" (SHMM). It extends traditional HMM's by partitioning the set of observation sequences into classes of equivalences. These observation sequences are related in the sense they all contribute to produce a particular local structure. We describe four basic problems that are assigned to a structural hidden Markov model: (1) probability evaluation, (2) statistical decoding, (3) local structure decoding, and (4) parameter estimation. We have applied SHMM in order to mine customers' preferences for automotive designs. The results reported in this application show that SHMM's outperform the traditional hidden Markov model with a 9% of increase in accuracy.
We present an overall performance comparison between the two most popular remote sensing image classification approaches which are: Pixel-based and Object-based. This evaluation is conducted using different state of the art statistical measures. The analysis of the classification power associated to these most widely utilized methods is conducted on Landsat-7 ETM+ image of Algiers through support vector machines. Since the performance of the object-based classification is inherently dependent on the success of the segmentation task, we have computed the overall accuracy, the kappa coefficient, the Zscore, the F-measure coefficient, and the area under ROC curve (AUC) value for different segmentation thresholds. This quantization of the segmentation level based on the number of pixels allowed to define a region (NPR) is necessary since image segmenters (which significantly impact classification) often exploit different paradigms and therefore exhibit different error rates. Our investigation has revealed that the object-based method is more accurate than the pixel-based method in the following two scenarios: (i) in the presence of a perfect segmentation task prior to object-based classification; (ii) whenever NPR is less than 8 pixels (corresponding to 240m in the current resolution). This second case is justified by the fact that the area under the ROC curve of object-based is larger than the one in the pixel-based. However, if NPR is not used or greater than 8 pixels, then the pixel-based approach is more appropriate.Keywords-remote sensing image classification; pixel-based approach; object-based approach; support vector machines, statistical performance evaluation measures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.