1996
DOI: 10.1016/0306-4379(96)00024-5
|View full text |Cite
|
Sign up to set email alerts
|

Partitioning similarity graphs: A framework for declustering problems

Abstract: Declustering problems are well-known in the databases for parallel computing environments.In this paper, we propose a new similarity-based technique for declustering data. The proposed method can adapt to the available information about query distribution (e.g. size, shape and frequency) and can work with alternative atomic data-types.Furthermore, the proposed method is flexible and can work with alternative data distributions, data sizes and partition-size constraints. The method is based on max-cut partition… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
0

Year Published

2003
2003
2013
2013

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 52 publications
(29 citation statements)
references
References 27 publications
(40 reference statements)
0
29
0
Order By: Relevance
“…To find a near optimal solution, their approach explores a solution space by adapting the large-neighborhood search technique. However, this approach and most of the approaches mentioned above are not well suited for our underlying scientific applications that are characterized by complex workload predicates involving many attributes; and this significantly degrades the efficiency of those approaches Graph-based approaches have been used to capture more complex relations between the workload and the data both for partitioning with the objective of declustering [13,11] and clustering [8]. They use two different models to represent data and queries: simple graph and hypergraph.…”
Section: Effect Of Imbalance Factor and Data Correlationmentioning
confidence: 99%
See 1 more Smart Citation
“…To find a near optimal solution, their approach explores a solution space by adapting the large-neighborhood search technique. However, this approach and most of the approaches mentioned above are not well suited for our underlying scientific applications that are characterized by complex workload predicates involving many attributes; and this significantly degrades the efficiency of those approaches Graph-based approaches have been used to capture more complex relations between the workload and the data both for partitioning with the objective of declustering [13,11] and clustering [8]. They use two different models to represent data and queries: simple graph and hypergraph.…”
Section: Effect Of Imbalance Factor and Data Correlationmentioning
confidence: 99%
“…In the hypergraph model [11], each query is modeled as a hyperedge (a set of vertices). In the simple graph model [13,8], queries are modeled as cliques of simple edges. Schism [8] is a recent system that partitions the data by building a graph containing the relations between queries and tuples.…”
Section: Effect Of Imbalance Factor and Data Correlationmentioning
confidence: 99%
“…There are also a few studies that propose to exploit query distribution information [2], [3], [4], if such information is available. For equal-sized data items, the total response time for a given query set can be minimized by evenly distributing the data items requested by each query across the disks as much as possible, while taking query frequencies into consideration.…”
Section: Introductionmentioning
confidence: 99%
“…For equal-sized data items, the total response time for a given query set can be minimized by evenly distributing the data items requested by each query across the disks as much as possible, while taking query frequencies into consideration. In [3], the declustering problem with a given query distribution is modeled as a max-cut partitioning of a weighted similarity graph. Here, data items are represented as vertices and an edge between two vertices indicate that corresponding data items appear in at least one query.…”
Section: Introductionmentioning
confidence: 99%
“…This is achieved by reducing the number of disk accesses performed by a single disk of the architecture while answering a single query. Declustering has been shown to be an NP-complete problem in some contexts [1], [2].…”
Section: Related Workmentioning
confidence: 99%