The B-tree and its variants have been found to be highly useful (both theoretically and in practice) for storing large amounts ofinformation, especially on secondary storage devices. We examine the problem of overcoming the inherent difficulty of concurrent operations on such structures, using a practical storage model. A single additional "link" pointer in each node allows a process to easily recover from tree modifications performed by other concurrent processes. Our solution compares favorably with earlier solutions in that the locking scheme is simpler (no read-locks are used) and only a (small) constant number of nodes are locked by any update process at any given time. An informal correctness proof for our system is given,
When data records are grouped into blocks in secondary storage, it is frequently necessary to estimate the number of blocks XD accessed for a given query. In a recent paper [Ij, Cardenas gave the expression = m{\ -(I - (1) assuming that there are n records divided into m blocks and that the k records satisfying the query are distributed uniformly among the m blocks. The derivation of the expression was left to the reader as an exercise.Let us take a closer look at the expression.(1 -1/m) gives the probability that a particular block does not contain a particular record. If k records are selected independently, then the probability that a particular block not being "hit" is given by (1 -!/m)*. Hence I -(I -\/mY gives the probability that a particular block is "hit," and the expression follows.The assumption that the k records are selected independently implies selection with replacement. Since a record may be selected more than once, the k records may not be distinct. This is not valid in the case of a query access which retrieves all k distinct records at one time. In fact, Rothnie and Lozano showed that the result of eq. (1) gives the lower bound of the expected number of blocks accessed [2]. A more accurate analysis based on selection without replacement was given by Severance, but the precision problem makes the expression obtained computationally intractable (Appendix D in [3]). A similar approach by Siler results in a rather complicated recursive formula which can be computed (Appendix B in [4]). Another recursive formula was given by [3]. Using a different 260 approach, a simple closed form was obtained by Yao in a different context [5). The resulting expression was used in several applications [5,6,7] to estimate the expected number of data blocks accessed. Comparing this to the Cardenas approximation, it is shown that this refinement is significant when the blocking factor n/m is small. For large blocking factors (e.g. n/m > 10), the error involved in Cardenas' approximation is practically negligible. THEOREM (Yao). Given n records grouped into m blocks {I < m < n), each contains n/m records. If k records {k < n -n/m) are randomly selected from the n records, the expected number of blocks hit {blocks with at least one record selected) is given by m L ,=i n -I -\-\_\
A generalized model for physical database organizations is presented. Existing database organizations are shown to fit easily into the model as special cases. Generalized access algorithms and cost equations associated with the model are developed and analyzed. The model provides a general design framework in which the distinguishing properties of database organizations are made explicit and their performances can be compared.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.