The error matrix has been adopted as both the “de facto” and the “de jure” standard way to report on the thematic accuracy assessment of any remotely sensed data product. This perspective assumes that the error matrix can be considered as a set of values following a unique multinomial distribution. However, the assumption of the underlying statistical model falls down when true reference data are available for quality control. To overcome this problem, a new method for thematic accuracy quality control is proposed, which uses a multinomial approach for each category and is called QCCS (quality control column set). The main advantage is that it allows us to state a set of quality specifications for each class and to test if they are fulfilled. These requirements can be related to the percentage of correctness in the classification for a particular class but also to the percentage of possible misclassifications or confusions between classes. In order to test whether such specifications are achieved or not, an exact multinomial test is proposed for each category. Furthermore, if a global hypothesis test is desired, the Bonferroni correction is proposed. All these new approaches allow a more flexible way of understanding and testing thematic accuracy quality control compared with the classical methods based on the confusion matrix. For a better understanding, a practical example of an application is included for classification with four categories.
Abstract:The confusion matrix is the standard way to report on the thematic accuracy of geographic data (spatial databases, topographic maps, thematic maps, classified images, remote sensing products, etc.). Two widely adopted indices for the assessment of thematic quality are derived from the confusion matrix. They are overall accuracy (OA) and the Kappa coefficient (k), which have received some criticism from some authors. Both can be used to test the similarity of two independent classifications by means of a simple statistical hypothesis test, which is the usual practice. Nevertheless, this is not recommended, because different combinations of cell values in the matrix can obtain the same value of OA or k, due to the aggregation of data needed to compute these indices. Thus, not rejecting a test for equality between two index values does not necessarily mean that the two matrices are similar. Therefore, we present a new statistical tool to evaluate the similarity between two confusion matrices. It takes into account that the number of sample units correctly and incorrectly classified can be modeled by means of a multinomial distribution. Thus, it uses the individual cell values in the matrices and not aggregated information, such as the OA or k values. For this purpose, it is considered a test function based on the discrete squared Hellinger distance, which is a measure of similarity between probability distributions. Given that the asymptotic approximation of the null distribution of the test statistic is rather poor for small and moderate sample sizes, we used a bootstrap estimator. To explore how the p-value evolves, we applied the proposed method over several predefined matrices which are perturbed in a specified range. Finally, a complete numerical example of the comparison of two matrices is presented.
In line generalization, a first goal to achieve is the classification of features previous to the selection of processes and parameters. A feed forward backpropagation artificial neural network (ANN) is designed for classifying a set of road lines through a supervised learning process, attempting to emulate a classification performed by a human expert for cartographic generalization purposes. The main steps of the process are presented in this paper: (a) experimental data selection; (b) segmentation of lines into homogeneous sections, (c) sections enrichment through a set of quantitative measures derived from a principal component analysis, and qualitative information derived from road network and road type; (d) expert classification of the sections; and finally (e) the ANN design, training and validation. The quality of results is analyzed by means of error matrices after a crossvalidation process giving a goodness, or percentage of agreement, over 83%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.