Partitioning similarity graphs: A framework for declustering problems

Liu, Duen‐Ren; Shekhar, Shashi

doi:10.1016/0306-4379(96)00024-5

Cited by 52 publications

(29 citation statements)

References 27 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To find a near optimal solution, their approach explores a solution space by adapting the large-neighborhood search technique. However, this approach and most of the approaches mentioned above are not well suited for our underlying scientific applications that are characterized by complex workload predicates involving many attributes; and this significantly degrades the efficiency of those approaches Graph-based approaches have been used to capture more complex relations between the workload and the data both for partitioning with the objective of declustering [13,11] and clustering [8]. They use two different models to represent data and queries: simple graph and hypergraph.…”

Section: Effect Of Imbalance Factor and Data Correlationmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic Workload-Based Partitioning Algorithms for Continuously Growing Databases

Liroz-Gistau

Akbarinia

Pacitti

et al. 2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Applications with very large databases, where data items are continuously appended, are becoming more and more common. Thus, the development of efficient data partitioning is one of the main requirements to yield good performance. In the case of applications that have complex access patterns, e.g. scientific applications, workload-based partitioning could be exploited. However, existing workload-based approaches, which work in a static way, cannot be applied to very large databases. In this paper, we propose DynPart and DynPartGroup, two dynamic partitioning algorithms for continuously growing databases. These algorithms efficiently adapt the data partitioning to the arrival of new data elements by taking into account the affinity of new data with queries and fragments. In contrast to existing static approaches, our approach offers constant execution time, no matter the size of the database, while obtaining very good partitioning efficiency. We validated our solution through experimentation over real-world data; the results show its effectiveness.

show abstract

Section: Effect Of Imbalance Factor and Data Correlationmentioning

confidence: 99%

“…In the hypergraph model [11], each query is modeled as a hyperedge (a set of vertices). In the simple graph model [13,8], queries are modeled as cliques of simple edges. Schism [8] is a recent system that partitions the data by building a graph containing the relations between queries and tuples.…”

Section: Effect Of Imbalance Factor and Data Correlationmentioning

confidence: 99%

Dynamic Workload-Based Partitioning Algorithms for Continuously Growing Databases

Liroz-Gistau

Akbarinia

Pacitti

et al. 2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…There are also a few studies that propose to exploit query distribution information [2], [3], [4], if such information is available. For equal-sized data items, the total response time for a given query set can be minimized by evenly distributing the data items requested by each query across the disks as much as possible, while taking query frequencies into consideration.…”

Section: Introductionmentioning

confidence: 99%

“…For equal-sized data items, the total response time for a given query set can be minimized by evenly distributing the data items requested by each query across the disks as much as possible, while taking query frequencies into consideration. In [3], the declustering problem with a given query distribution is modeled as a max-cut partitioning of a weighted similarity graph. Here, data items are represented as vertices and an edge between two vertices indicate that corresponding data items appear in at least one query.…”

Section: Introductionmentioning

confidence: 99%

Selective Replicated Declustering for Arbitrary Queries

Oktay

Türk

Aykanat

2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Data declustering is used to minimize query response times in data intensive applications. In this technique, query retrieval process is parallelized by distributing the data among several disks and it is useful in applications such as geographic information systems that access huge amounts of data. Declustering with replication is an extension of declustering with possible data replicas in the system. Many replicated declustering schemes have been proposed. Most of these schemes generate two or more copies of all data items. However, some applications have very large data sizes and even having two copies of all data items may not be feasible. In such systems selective replication is a necessity. Furthermore, existing replication schemes are not designed to utilize query distribution information if such information is available. In this study we propose a replicated declustering scheme that decides both on the data items to be replicated and the assignment of all data items to disks when there is limited replication capacity. We make use of available query information in order to decide replication and partitioning of the data and try to optimize aggregate parallel response time. We propose and implement a Fiduccia-Mattheyses-like iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multi-way replicated declustering. Experiments conducted with arbitrary queries on real datasets show that, especially for low replication constraints, the proposed scheme yields better performance results compared to existing replicated declustering schemes.

show abstract

“…This is achieved by reducing the number of disk accesses performed by a single disk of the architecture while answering a single query. Declustering has been shown to be an NP-complete problem in some contexts [1], [2].…”

Section: Related Workmentioning

confidence: 99%

Query-Log Aware Replicated Declustering

Türk

Oktay

Aykanat

2013

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-Data declustering and replication can be used to reduce I/O times related with processing of data intensive queries. Declustering parallelizes the query retrieval process by distributing the data items requested by queries among several disks. Replication enables alternative disk choices for individual disk items and thus provides better query parallelism options. In general, existing replicated declustering schemes do not consider query log information and try to optimize all possible queries for a specific query type, such as range or spatial queries. In such schemes, it is assumed that two or more copies of all data items are to be generated and scheduling of these copies to disks are discussed. However, in some applications, generation of even two copies of all of the data items is not feasible, since data items tend to have very large sizes. In this work, we assume that there is a given limit on disk capacities and thus on replication amounts. We utilize existing query-log information to propose a selective replicated declustering scheme, in which we select the data items to be replicated and decide on their scheduling onto disks while respecting disk capacities. We propose and implement an iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multiway replicated declustering. Then we improve the obtained multiway replicated declustering by efficient refinement heuristics. Experiments conducted on realistic data sets show that the proposed scheme yields better performance results compared to existing replicated declustering schemes.

show abstract

Partitioning similarity graphs: A framework for declustering problems

Cited by 52 publications

References 27 publications

Dynamic Workload-Based Partitioning Algorithms for Continuously Growing Databases

Dynamic Workload-Based Partitioning Algorithms for Continuously Growing Databases

Selective Replicated Declustering for Arbitrary Queries

Query-Log Aware Replicated Declustering

Contact Info

Product

Resources

About