DE SOUZA, J. A. Clustering complex data for processing constrained similarity queries 2 . 2019. 102 p. Tese (Doutorado em Ciências -Due to the technological advances over the last years, both the amount and variety of data available have been increased at a fast pace. Thus, this scenario has influenced the development of effective strategies for the processing, summarizing, as well as to provide fast and automatic understanding of such data. The Access Methods are strategies that have been explored by researchers in the area to aid these purposes. These methods aim to effectively index data to reduce the time required for processing similarity querying. In addition, they have been applied to aid the processing of Data Mining techniques, such as Clustering Detection. Among the access methods, the metric structures are constructed applying only the criterion based on the distance computation between the elements of the dataset, i.e. similarity operations on the intrinsic characteristics of the dataset. Thus, the results do not always correspond to the context desired by users.This work explored the development of algorithms that allow metric access methods to process queries with a higher semantic load, aimed at contributing to the treatment of the quality question on the results of approaches that involve similarity operation (for example, data mining techniques and similarity queries). In this context, three approaches have been developed: the first approach presents the method clusMAM (Unsupervised Clustering using Metric Access Methods), which aims to display a clustering from a dataset with the application of a Metric Access Method from a summarized set. The second approach presents the CCkNN approach to dealing with the problem of multi-class constraints on the search space. Finally, the third proposal presents the method CfQ (Clustering for Querying) by integrating the techniques clusMAM with CCkNN, using the positive points of each strategy applied by the algorithms. In general, the experiments carried out showed that the proposed methods can contribute to an effective way of reducing similarity computations, which is required during a processing of techniques that are based on distance computations.
Can we use information from social media and crowdsourced images to detect smoke and assist rescue forces? While there are computer vision methods for detecting smoke, they require movement information extracted from video data. In this paper we propose SmokeBlock: a method that is able to segment and detect smoke in still images. SmokeBlock uses superpixel segmentation and extracts local color and texture features from images to spot smoke. We used real data from Flickr and compared SmokeBlock against state-of-theart methods for feature extraction. Our method achieved performance superior than the competitors, for the task of smoke detection. Our findings shall support further investigations in the field of image analysis, in particular, concerning images captured with mobile devices.
Background: There are several application scenarios that can take advantage from the efficient processing of similarity operations in complex data types, such as multimedia data. Among them, it is possible to mention the execution of more complex query types (e.g., similarity queries) and several well-known data mining algorithms (e.g., data clustering) that are directly based on similarity computations. In order to speed up the similarity-based comparisons performed by these approaches, it is possible to store the dataset in specialized data structures known as metric access methods (MAM). Methods: In this article we present four node split policies that can be employed in the construction of M-tree, the pioneer dynamic MAM, and of Slim-tree, the M-tree successor. Results: These policies allow faster tree construction, as they result in better distribution of elements on the tree nodes and require less distance calculations when compared with the previously proposed ones. Furthermore, trees built with these policies have shown to be more efficient for techniques that require similarity computations, such as nearest neighbors queries and data clustering algorithms. Conclusion:The experimental results show that trees built with the proposed policies outperform those built with the original ones with regard to the number of disk accesses, the amount of distance calculations, and the time required to run the queries.
Most similarity search techniques for multimedia data is performed in metric spaces and with the aid of data structures known as metric access methods (MAM). Herein, we present three new node split strategies for M-tree and Slim-tree construction, the pioneer dynamic MAM. These strategies result in better distribution of elements on the tree nodes and require less distance calculations when compared with the previously proposed ones. Moreover, trees built with these strategies have shown to be more efficient for similarity queries, such as nearest neighbors. The experimental results show that trees built with the proposed strategies outperform those built with the original ones with regard to the number of disk accesses, the amount of distance calculations and time required to run the queries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.