Jessica A. de Souza scite author profile

DE SOUZA, J. A. Clustering complex data for processing constrained similarity queries 2 . 2019. 102 p. Tese (Doutorado em Ciências -Due to the technological advances over the last years, both the amount and variety of data available have been increased at a fast pace. Thus, this scenario has influenced the development of effective strategies for the processing, summarizing, as well as to provide fast and automatic understanding of such data. The Access Methods are strategies that have been explored by researchers in the area to aid these purposes. These methods aim to effectively index data to reduce the time required for processing similarity querying. In addition, they have been applied to aid the processing of Data Mining techniques, such as Clustering Detection. Among the access methods, the metric structures are constructed applying only the criterion based on the distance computation between the elements of the dataset, i.e. similarity operations on the intrinsic characteristics of the dataset. Thus, the results do not always correspond to the context desired by users.This work explored the development of algorithms that allow metric access methods to process queries with a higher semantic load, aimed at contributing to the treatment of the quality question on the results of approaches that involve similarity operation (for example, data mining techniques and similarity queries). In this context, three approaches have been developed: the first approach presents the method clusMAM (Unsupervised Clustering using Metric Access Methods), which aims to display a clustering from a dataset with the application of a Metric Access Method from a summarized set. The second approach presents the CCkNN approach to dealing with the problem of multi-class constraints on the search space. Finally, the third proposal presents the method CfQ (Clustering for Querying) by integrating the techniques clusMAM with CCkNN, using the positive points of each strategy applied by the algorithms. In general, the experiments carried out showed that the proposed methods can contribute to an effective way of reducing similarity computations, which is required during a processing of techniques that are based on distance computations.

show abstract

Unveiling smoke in social images with the SmokeBlock approach

Cazzolato

Bedo

Costa

et al. 2016

View full text Add to dashboard Cite

Can we use information from social media and crowdsourced images to detect smoke and assist rescue forces? While there are computer vision methods for detecting smoke, they require movement information extracted from video data. In this paper we propose SmokeBlock: a method that is able to segment and detect smoke in still images. SmokeBlock uses superpixel segmentation and extracts local color and texture features from images to spot smoke. We used real data from Flickr and compared SmokeBlock against state-of-theart methods for feature extraction. Our method achieved performance superior than the competitors, for the task of smoke detection. Our findings shall support further investigations in the field of image analysis, in particular, concerning images captured with mobile devices.

show abstract

Optimizing metric access methods for querying and mining complex data types

2014

View full text Add to dashboard Cite

Background: There are several application scenarios that can take advantage from the efficient processing of similarity operations in complex data types, such as multimedia data. Among them, it is possible to mention the execution of more complex query types (e.g., similarity queries) and several well-known data mining algorithms (e.g., data clustering) that are directly based on similarity computations. In order to speed up the similarity-based comparisons performed by these approaches, it is possible to store the dataset in specialized data structures known as metric access methods (MAM). Methods: In this article we present four node split policies that can be employed in the construction of M-tree, the pioneer dynamic MAM, and of Slim-tree, the M-tree successor. Results: These policies allow faster tree construction, as they result in better distribution of elements on the tree nodes and require less distance calculations when compared with the previously proposed ones. Furthermore, trees built with these policies have shown to be more efficient for techniques that require similarity computations, such as nearest neighbors queries and data clustering algorithms. Conclusion:The experimental results show that trees built with the proposed policies outperform those built with the original ones with regard to the number of disk accesses, the amount of distance calculations, and the time required to run the queries.

show abstract

Faster construction of ball-partitioning-based metric access methods

Souza

Razente

Barioni

2013

View full text Add to dashboard Cite

Most similarity search techniques for multimedia data is performed in metric spaces and with the aid of data structures known as metric access methods (MAM). Herein, we present three new node split strategies for M-tree and Slim-tree construction, the pioneer dynamic MAM. These strategies result in better distribution of elements on the tree nodes and require less distance calculations when compared with the previously proposed ones. Moreover, trees built with these strategies have shown to be more efficient for similarity queries, such as nearest neighbors. The experimental results show that trees built with the proposed strategies outperform those built with the original ones with regard to the number of disk accesses, the amount of distance calculations and time required to run the queries.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.