A heuristic algorithm for determining a near-optimal set of nodes to access in a partially replicated distributed database system

Data distribution and degrees of data replication are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make unrealistic assumptions about the data distribution and replication in a system. Little is known about the impact of these assumptions on the final design. In this paper, we investigate the effect of such assumptions on the performance measures as well as the computational complexity and accuracy of such evaluations. We chose the size of the participating node set of a transaction as the desired performance measure. Probabilistic analysis is employed to evaluate six models. We conclude that even though some of the data distribution and replication models appear to be simplistic, the results obtained from them are very close to those from complex models.1.Introduction A distributed database system is a collection of cooperating nodes each containing a set of data items '. A user transaction can enter such a system at any of these nodes. The receiving node, some times referred to as the coordinating or initiating node, undertakes the task of locating the nodes that contain the data items required by a transaction. The set of nodes that are selected for the execution of such a transaction is called the participating node set of that transaction.The three most important performance measures of interest to a distributed database system designer are the average transaction response time, the transaction availability, and the system cost (e.g., communication cost, storage cost, etc.). Even though it is possible to formulate equations expressing these measures in terms of the system parameters, the evaluation of these measures is extremely cumbersome and requires unreasonably high computation time. The evaluation of the exact values for these measures generally involves both analysis and simulation. Evaluation tools with such large execution times are certainly not acceptable to a database designer who needs to evaluate This research was in part supported by an Old Dominion University Faculty Research Fellowship.a number of such possible database configurations before arriving at a final design.To overcome these problems, designers and researchers generally resort to approximation techniques [3,4]. These techniques reduce the computation time by making simplifying assumptions regarding data distribution, data replication, and transaction execution. The time complexity of these techniques greatly depends on the underlyir g model as well as the evaluation technique.In this paper, we study the effect of data distribution and data replication models on the accuracy of the of corresponding performance measures. Due to the importance of the size of the participating node set of a transaction in computing the above performance measures, this paper evaluates the models with respect to this measure. We employ probabilistic analysis to arrive at the desired estimate values for six typical models....

show abstract

Some guidelines for choosing models for distributed database systems

Mukkamala

Gollakota²

Fourth IEEE Region 10 International Conference TENCON

View full text Add to dashboard Cite

A heuristic algorithm for determining a near-optimal set of nodes to access in a partially replicated distributed database system

Cited by 3 publications

References 6 publications

Measuring the effects of data distribution models on performance evaluation of distributed database systems

Measuring the effects of data distribution models on performance evaluation of distributed database systems

Measuring the effect of data distribution and replication models on performance evaluation of distributed database systems

Some guidelines for choosing models for distributed database systems

Contact Info

Product

Resources

About