Abstract. Web warehousing plays a key role in providing the managers with up-to-date and comprehensive information about their business domain. On the other hand, since XML is now a standard de facto for the exchange of semi-structured data, integrating XML data into web warehouses is a hot topic. In this paper we propose a semi-automated methodology for designing web warehouses from XML sources modeled by XML Schemas. In the proposed methodology, design is carried out by first creating a schema graph, then navigating its arcs in order to derive a correct multidimensional representation. Differently from previous approaches in the literature, particular relevance is given to the problem of detecting shared hierarchies and convergence of dependencies, and of modeling many-to-many relationships. The approach is implemented in a prototype that reads an XML Schema and produces in output the logical schema of the warehouse.
Abstract.A federated data warehouse is a logical integration of data warehouses applicable when physical integration is impossible due to privacy policy or legal restrictions. In order to enable the translation of queries in a federated approach, schemas of the federated and the local warehouses must be matched. In this paper we present a procedure that enables the matching process for schema structures specific to the multidimensional model of data warehouses: facts, measures, dimensions, aggregation levels and dimensional attributes. Similarities between warehouse-specific structures are computed by using linguistic and structural comparison, where calculated values are used to create necessary mappings. We present restriction rules and recommendations for aggregation level matching, which builds the most complex part of the process. A software implementation of the entire process is provided in order to perform its verification, as well as to determine the proper selection metric for mapping different multidimensional structures.
A federated data warehouse is a logical integration of data warehouses applicable when physical integration is impossible due to privacy policy or legal restrictions. In healthcare systems federated data warehouses are a most feasible source of data for deducing guidelines for evidence-based medicine based on data material from different participating institutions. In order to enable the translation of queries in a federated approach, schemas of the federated warehouse and the local warehouses must be matched. In this paper we present a procedure that enables the matching process for schema structures specific to the multidimensional model of data warehouses: facts, measures, dimensions, aggregation levels and dimensional attributes. Similarities between warehouse-specific structures are computed by using linguistic and structural comparison. The calculated values are used to create necessary mappings.
Anomaly detection is a hard data analysis process that requires constant creation and improvement of data analysis algorithms. Using traditional clustering algorithms to analyse data streams is impossible due to processing power and memory issues. To solve this, the traditional clustering algorithm complexity needed to be reduced, which led to the creation of sequential clustering algorithms. The usual approach is two-phase clustering, which uses online phase to relax data details and complexity, and offline phase to cluster concepts created in the online phase. Detecting anomalies in a data stream is usually solved in the online phase, as it requires unreduced data. Contrarily, producing good macro-clustering is done in the offline phase, which is the reason why two-phase clustering algorithms have difficulty being equally good in anomaly detection and macro-clustering. In this paper, we propose a statistical hierarchical clustering algorithm equally suitable for both detecting anomalies and macro-clustering. The proposed algorithm is single-phased and uses statistical inference on the input data stream, resulting in statistical distributions that are constantly updated. This makes the classification adaptable, allowing agglomeration of outliers into clusters, tracking population evolution, and to be used without knowing the expected number of clusters and outliers. The proposed algorithm was tested against typical clustering algorithms, including two-phase algorithms suitable for data stream analysis. A number of typical test cases were selected, to show the universality and qualities of the proposed clustering algorithm.
Today, there is a rapid increase of the available data because of advances in information and communications technology. Therefore, many mutually heterogeneous data sources that describe the same domain of interest exist. To facilitate the integration of these heterogeneous data sources, an ontology can be used as it enriches the knowledge of a data source by giving a detailed description of entities and their mutual relations within the domain of interest. Ontology matching is a key issue in integrating heterogeneous data sources described by ontologies as it eases the management of data coming from various sources. The ontology matching system consists of several basic matchers.To determine high-quality correspondences between entities of compared ontologies, the matching results of these basic matchers should be aggregated by an aggregation method. In this paper, a new weighted aggregation method for parallel composition of basic matchers based on genetic algorithm is presented. The evaluation has confirmed a high quality of the new aggregation method as this method has improved the process of matching two ontologies by obtaining higher confidence values of correctly found correspondences and thus increasing the quality of matching results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.