Data allocation in distributed database systems

Apers, Peter M. G.

doi:10.1145/44498.45063

Cited by 199 publications

(87 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…There are methods that do only fragmentation [1,24,26,33,34] and methods that do only allocation of predefined fragments [3,4,7,10,13,20,30]. Some methods also exist that integrate both tasks [9,11,17,19,25,27,29].…”

Section: Related Workmentioning

confidence: 99%

DYFRAM: dynamic fragmentation and replica management in distributed database systems

Hauglid

Ryeng

Nørvåg

2010

Distrib Parallel Databases

View full text Add to dashboard Cite

In distributed database systems, tables are frequently fragmented and replicated over a number of sites in order to reduce network communication costs. How to fragment, when to replicate and how to allocate the fragments to the sites are challenging problems that has previously been solved either by static fragmentation, replication and allocation, or based on a priori query analysis. Many emerging applications of distributed database systems generate very dynamic workloads with frequent changes in access patterns from different sites. In such contexts, continuous refragmentation and reallocation can significantly improve performance. In this paper we present DYFRAM, a decentralized approach for dynamic table fragmentation and allocation in distributed database systems based on observation of the access patterns of sites to tables. The approach performs fragmentation, replication, and reallocation based on recent access history, aiming at maximizing the number of local accesses compared to accesses from remote sites. We show through simulations and experiments on the DASCOSA distributed database system that the approach significantly reduces communication costs for typical access patterns, thus demonstrating the feasibility of our approach.

show abstract

Section: Related Workmentioning

confidence: 99%

DYFRAM: dynamic fragmentation and replica management in distributed database systems

Hauglid

Ryeng

Nørvåg

2010

Distrib Parallel Databases

View full text Add to dashboard Cite

show abstract

“…Since the complexity of the problem is NP-complete [5], heuristics are normally used to find a nearly optimal solution in a reasonable amount of time. According to the criteria used in reducing costs incurred on resources such as network bandwidth, CPUs, and disks, data placement strategies can be classified into three categories, which are network traffic based [6], size based [7], and access frequency based [8]. The main idea of these approaches is to achieve the minimal load (e.g.…”

Section: Related Work and Motivationsmentioning

confidence: 99%

An Effective Data Placement Strategy for XML Documents

Zhu

2001

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. As XML is increasingly being used in Web applications, new technologies need to be investigated for processing XML documents with high performance. Parallelism is a promising solution for structured document processing and data placement is a major factor for system performance improvement in parallel processing. This paper describes an effective XML document data placement strategy. The new strategy is based on a multilevel graph partitioning algorithm with the consideration of the unique features of XML documents and query distributions. A new algorithm, which is based on XML query schemas to derive the weighted graph from the labelled directed graph presentation of XML documents, is also proposed. Performance analysis on the algorithm presented in the paper shows that the new data placement strategy exhibits low workload skew and a high degree of parallelism.Keywords: Data Placement, XML Documents, Graph Partitioning, and Parallel Data Processing. IntroductionAs a new markup language for structured documentation, XML (eXtensible Markup Language) is increasingly being used in Web applications because of its unique features in data representation and exchange. The main advantage of XML is that each XML file can have a semantic schema and makes it possible to define much more meaningful queries than simple, keyword-based retrievals. A recent survey shows that the number of XML business vocabularies has increased from 124 to over 250 in six months [1]. It can be expected that data in XML format would be largely available throughout the Web in the near future. As Web applications are time vulnerable, the increasing size of XML documents and the complexity of evaluating XML queries pose new performance challenges to existing information retrieval technologies. The use of parallelism has shown good scalability in traditional database applications and provides an attractive solution to process structured documents [2]. A large number of XML documents can be distributed onto several processing nodes so that a reasonable query response time can be achieved by processing the related data in parallel.

show abstract

“…However, rapid updates (or writes) may counteract the replication benefit because of the overhead in maintaining a large number of replicas [41]. With reads and updates, the locations of the replicas have to be: (1) in close proximity to the client(s), and (2) in close proximity to the primary (assuming a broadcast update model) copy [33]. Therefore, the efficiency of a replication scheme depends strongly on the number of replicas and the selection of the placement sites [42].…”

Section: Introductionmentioning

confidence: 99%

“…File allocation: File allocation has been a popular line of research in distributed computing [14,44], distributed databases [2], multimedia databases [54], paging algorithms [16], and video server systems [54]. The generalized file allocation problem for multiple objects [11] has been proven to be NP-complete [15].…”

Section: Introductionmentioning

confidence: 99%