Abstract:Batching yields significant savings in access costs in sequential, tree-structured, and random files. A direct and simple expression is developed for computing the average number of records/pages accessed to satisfy a batched query of a sequential file. The advantages of batching for sequential and random files are discussed. A direct equation is provided for the number of nodes accessed in unbatched queries of hierarchical files. An exact recursive expression is developed for node accesses in batched queries of hierarchical files. In addition to the recursive relationship, good, closed-form upper-and lower-bound approximations are provided for the case of batched queries of hierarchical files. [4,7,8,11]. Different organization structures and access methods are relevant, depending on the usage requirements of the data stored in the files and in the database. In today's proliferation of on-line, fast-response systems, it is very common to have random (hash-based) and indexed file organizations with fast, direct access to individual records in the file. However, in batch applications and some online applications, it may be desirable to sequentially search a batch of records in the file. Shneiderman and Goodman [9] have shown the desirability of batched searches in sequential and tree structure organizations. This paper refines and extends the expressions and results reported by Shneiderman and Goodman [9] and later by Batory and Gotlieb [1]. Shneiderman and Goodman developed expressions to show the savings due to batching in sequential and tree organizations. They did not, however, find exact closed-form expressions. Their expressions were complex recursive relations; they did, however, find a closed-form lower-bound estimate for sequential files. Batory and Gotlieb speculated on the form of the expression for the number of node accesses in a sequential search on the basis of the work in [9].A direct approach is taken here. Rather than obtaining savings due to batching, explicit and accurate expressions are derived for the cost of batching. Then, the cost of batching can be compared to the cost of any other type of search. (Note that the above authors [1,9] compare the cost of batched k requests to the cost of k individual searches.) The benefits of these expressions threefold: first, the new equations are exact and closed-form (nonrecursive, noniterative) in the sequential case; second, closed-form equations are easier and simpler to use in any further or related work; and, finally, savings due to batching can be obtained in comparison with any other search technique.Expressions are developed first for the sequential files and then for hierarchically structured files.