This paper proposes a novel framework for mining regional colocation patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are co-located. A co-location mining framework is introduced that operates in the continuous domain without the need for discretization and which views regional co-location mining as a clustering problem in which an externally given fitness function has to be maximized. Interestingness of co-location patterns is assessed using products of z-scores of the relevant continuous variables. The proposed framework is evaluated by a domain expert in a case study that analyzes Arsenic contamination in Texas water wells centering on regional co-location patterns. Our approach is able to identify known and unknown regional colocation patterns, and different sets of algorithm parameters lead to the characterization of Arsenic distribution at different scales. Moreover, inconsistent co-location sets are found for regions in South Texas and West Texas that can be clearly attributed to geological differences in the two regions, emphasizing the need for regional co-location mining techniques. Moreover, a novel, prototype-based region discovery algorithm named CLEVER is introduced that uses randomized hill climbing, and searches a variable number of clusters and larger neighborhood sizes.
Abstract. This paper presents a novel region discovery framework geared towards finding scientifically interesting places in spatial datasets. We view region discovery as a clustering problem in which an externally given fitness function has to be maximized. The framework adapts four representative clustering algorithms, exemplifying prototype-based, gridbased, density-based, and agglomerative clustering algorithms, and then we systematically evaluated the four algorithms in a real-world case study. The task is to find feature-based hotspots where extreme densities of deep ice and shallow ice co-locate on Mars. The results reveal that the density-based algorithm outperforms other algorithms inasmuch as it discovers more regions with higher interestingness, the grid-based algorithm can provide acceptable solutions quickly, while the agglomerative clustering algorithm performs best to identify larger regions of arbitrary shape. Moreover, the results indicate that there are only a few regions on Mars where shallow and deep ground ice co-locate, suggesting that they have been deposited at different geological times.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.