Abstract:The problem of allocating the data of a database to the sites of a communication network is investigated. This problem deviates from the well-known file allocation problem in several aspects. First, the objects to be allocated are not known a priori; second, these objects are accessed by schedules that contain transmissions between objects to produce the result. A model that makes it possible to compare the cost of allocations is presented, the cost can be computed for different cost functions and for processi… Show more
“…There are methods that do only fragmentation [1,24,26,33,34] and methods that do only allocation of predefined fragments [3,4,7,10,13,20,30]. Some methods also exist that integrate both tasks [9,11,17,19,25,27,29].…”
In distributed database systems, tables are frequently fragmented and replicated over a number of sites in order to reduce network communication costs. How to fragment, when to replicate and how to allocate the fragments to the sites are challenging problems that has previously been solved either by static fragmentation, replication and allocation, or based on a priori query analysis. Many emerging applications of distributed database systems generate very dynamic workloads with frequent changes in access patterns from different sites. In such contexts, continuous refragmentation and reallocation can significantly improve performance. In this paper we present DYFRAM, a decentralized approach for dynamic table fragmentation and allocation in distributed database systems based on observation of the access patterns of sites to tables. The approach performs fragmentation, replication, and reallocation based on recent access history, aiming at maximizing the number of local accesses compared to accesses from remote sites. We show through simulations and experiments on the DASCOSA distributed database system that the approach significantly reduces communication costs for typical access patterns, thus demonstrating the feasibility of our approach.
“…There are methods that do only fragmentation [1,24,26,33,34] and methods that do only allocation of predefined fragments [3,4,7,10,13,20,30]. Some methods also exist that integrate both tasks [9,11,17,19,25,27,29].…”
In distributed database systems, tables are frequently fragmented and replicated over a number of sites in order to reduce network communication costs. How to fragment, when to replicate and how to allocate the fragments to the sites are challenging problems that has previously been solved either by static fragmentation, replication and allocation, or based on a priori query analysis. Many emerging applications of distributed database systems generate very dynamic workloads with frequent changes in access patterns from different sites. In such contexts, continuous refragmentation and reallocation can significantly improve performance. In this paper we present DYFRAM, a decentralized approach for dynamic table fragmentation and allocation in distributed database systems based on observation of the access patterns of sites to tables. The approach performs fragmentation, replication, and reallocation based on recent access history, aiming at maximizing the number of local accesses compared to accesses from remote sites. We show through simulations and experiments on the DASCOSA distributed database system that the approach significantly reduces communication costs for typical access patterns, thus demonstrating the feasibility of our approach.
“…Since the complexity of the problem is NP-complete [5], heuristics are normally used to find a nearly optimal solution in a reasonable amount of time. According to the criteria used in reducing costs incurred on resources such as network bandwidth, CPUs, and disks, data placement strategies can be classified into three categories, which are network traffic based [6], size based [7], and access frequency based [8]. The main idea of these approaches is to achieve the minimal load (e.g.…”
Abstract. As XML is increasingly being used in Web applications, new technologies need to be investigated for processing XML documents with high performance. Parallelism is a promising solution for structured document processing and data placement is a major factor for system performance improvement in parallel processing. This paper describes an effective XML document data placement strategy. The new strategy is based on a multilevel graph partitioning algorithm with the consideration of the unique features of XML documents and query distributions. A new algorithm, which is based on XML query schemas to derive the weighted graph from the labelled directed graph presentation of XML documents, is also proposed. Performance analysis on the algorithm presented in the paper shows that the new data placement strategy exhibits low workload skew and a high degree of parallelism.Keywords: Data Placement, XML Documents, Graph Partitioning, and Parallel Data Processing.
IntroductionAs a new markup language for structured documentation, XML (eXtensible Markup Language) is increasingly being used in Web applications because of its unique features in data representation and exchange. The main advantage of XML is that each XML file can have a semantic schema and makes it possible to define much more meaningful queries than simple, keyword-based retrievals. A recent survey shows that the number of XML business vocabularies has increased from 124 to over 250 in six months [1]. It can be expected that data in XML format would be largely available throughout the Web in the near future. As Web applications are time vulnerable, the increasing size of XML documents and the complexity of evaluating XML queries pose new performance challenges to existing information retrieval technologies. The use of parallelism has shown good scalability in traditional database applications and provides an attractive solution to process structured documents [2]. A large number of XML documents can be distributed onto several processing nodes so that a reasonable query response time can be achieved by processing the related data in parallel.
“…However, rapid updates (or writes) may counteract the replication benefit because of the overhead in maintaining a large number of replicas [41]. With reads and updates, the locations of the replicas have to be: (1) in close proximity to the client(s), and (2) in close proximity to the primary (assuming a broadcast update model) copy [33]. Therefore, the efficiency of a replication scheme depends strongly on the number of replicas and the selection of the placement sites [42].…”
Section: Introductionmentioning
confidence: 99%
“…File allocation: File allocation has been a popular line of research in distributed computing [14,44], distributed databases [2], multimedia databases [54], paging algorithms [16], and video server systems [54]. The generalized file allocation problem for multiple objects [11] has been proven to be NP-complete [15].…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.