In order to make reuse practices effective, developers must find required components easily. In such a direction, representation models based on semi-structured data have been adopted to facilitate the discovery of required components. Following such a trend, this paper presents the architecture, functionalities and implementation of a search service that adopts techniques for indexing semistructured data, making possible the discovery of software assets through regular path expression queries.
One of the major challenges in development of indexing techniques for semi-structured data is related to how to index the data structural properties. The main issue is how to efficiently handle branching path expressions without suffering from undesired growth of query processing costs and index file sizes. Several proposals for indexing semi-structured data can be found in the literature. However, in order to reduce index file sizes, most of them do not index or handle branching path expressions. Considering those ones that do that, they usually suffer from high query processing costs and large index file sizes. In such a context, this paper proposes a path-based indexing technique for semi-structured data, which deals with a well-defined class of branching path expressions. As evinced by experimental evaluation, the adoption of the proposed technique results in excellent query processing time and generates index file sizes close to data input file sizes.
Diversos repositórios de componentes têm sido propostos com o objetivo de potencializar o reuso de software. No entanto, as propostas atuais ainda adotam abordagens locais e centralizadas, que inibem o reuso em larga escala. Neste contexto, este artigo apresenta um serviço de repositório compartilhado e distribuído, que integra facilidades de controle de acesso, controle de versão e gerência de métricas de reuso. Como inovação, o repositório proposto pode ser explorado em abordagens de desenvolvimento distribuído, nas quais equipes remotas compartilham artefatos de software.
The explosive growth of web-based information systems has created various sources and vast quantities of semi-structured data, which need to be indexed by search engines in order to allow the retrieval of documents according to user needs. However, one of the major challenges in the development of indexing techniques for semi-structured data is related to how to index not only textual but also structural content. The main issue is how to efficiently handle branching path expressions without introducing precision loss as well as undesired increase of query processing costs and index file sizes. Several proposals for indexing semi-structured data can be found in the literature. Despite their relevant contributions, existing proposals suffer from at least one of the problems related to precision loss, storage space requirements and query processing costs. In such a context, this paper proposes an efficient, lossless path-based indexing technique for semi-structured data, which deals with a well-defined class of branching path expressions, preserving one-to-many relationships among elements. As evinced by experimental evaluation, the adoption of the proposed technique results in excellent query processing time and generates smaller index file sizes than structural join indexing techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.