Approximating block accesses in database organizations

Yao, S. Bing

doi:10.1145/359461.359475

Cited by 316 publications

(96 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Next, let's preserve the clustering structure and distribute all documents randomly to these clusters. The average number of target clusters for this case is shown by n tr and its value can be calculated without creating random clusters by the modified form [Can, Ozkarahan, 1990] of Yao's formula [Yao, 1977]; however, for the validity decision we need the distribution of the n tr values. The case n t > n tr suggests that the tested clustering structure is invalid, since it is unsuccessful in placing the documents relevant to the same query into a fewer number of clusters than that of the average random case.…”

Section: Validation Of the Generated Clustering Structurementioning

confidence: 99%

Efficiency and effectiveness of query processing in cluster-based retrieval

Can

Altıngövde

Demir

2004

Information Systems

View full text Add to dashboard Cite

Our research shows that for large databases, without considerable additional storage overhead, clusterbased retrieval (CBR) can compete with the time efficiency and effectiveness of the inverted indexbased full search (FS). The proposed CBR method employs a storage structure that blends the cluster membership information into the inverted file posting lists. This approach significantly reduces the cost of similarity calculations for document ranking during query processing and improves efficiency. For example, in terms of in-memory computations, our new approach can reduce query processing time to 39% of FS. The experiments confirm that the approach is scalable and system performance improves with increasing database size. In the experiments, we use the Cover Coefficient-based Clustering Methodology (C 3 M), and the Financial Times database of TREC-4 containing 210,158 documents of size 564 MB defined by 229,748 terms with total of 29,545,234 inverted index elements. This study provides CBR efficiency and effectiveness experiments using the largest corpus in an environment that employs no user interaction or user behavior assumption for clustering.

show abstract

Section: Validation Of the Generated Clustering Structurementioning

confidence: 99%

Efficiency and effectiveness of query processing in cluster-based retrieval

Can

Altıngövde

Demir

2004

Information Systems

View full text Add to dashboard Cite

show abstract

“…Cardenas [7], e.g., gives Equation 7 for to estimate the distinct accessed records when accessing one of R.n records r times. Whilst challenged repeatedly for special cases [13], [34], [9], we found the equation yields virtually identical results to the equation from the original cost model while being much cheaper to compute.…”

Section: Extensions To the Generic Cost Modelmentioning

confidence: 88%

CPU and cache efficient management of memory-resident databases

Pirk

Funke²,

Grund

et al. 2013

2013 IEEE 29th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

Abstract-Memory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current implementations, bandwidth savings achieved by partial decomposition come at increased CPU costs. To achieve the aspired bandwidth savings without sacrificing CPU efficiency, we combine partially decomposed storage with Just-in-Time (JiT) compilation of queries, thus eliminating CPU inefficient function calls. Since existing cost based optimization components are not designed for JiT-compiled query execution, we also develop a novel approach to cost modeling and subsequent storage layout optimization. Our evaluation shows that the JiT-based processor maintains the bandwidth savings of previously presented hybrid query processors but outperforms them by two orders of magnitude due to increased CPU efficiency.

show abstract

“…This expression was shown by Palvia and March [6] to be an overall better estimator than the prevalently used approximation by Cardenas [2], and computationally more efficient than the exact expression by Yao [10].…”

Section: Sequential Filesmentioning

confidence: 88%

Expressions for batched searching of sequential and hierarchical files

Palvia

1985

ACM Trans. Database Syst.

View full text Add to dashboard Cite

Abstract:Batching yields significant savings in access costs in sequential, tree-structured, and random files. A direct and simple expression is developed for computing the average number of records/pages accessed to satisfy a batched query of a sequential file. The advantages of batching for sequential and random files are discussed. A direct equation is provided for the number of nodes accessed in unbatched queries of hierarchical files. An exact recursive expression is developed for node accesses in batched queries of hierarchical files. In addition to the recursive relationship, good, closed-form upper-and lower-bound approximations are provided for the case of batched queries of hierarchical files. [4,7,8,11]. Different organization structures and access methods are relevant, depending on the usage requirements of the data stored in the files and in the database. In today's proliferation of on-line, fast-response systems, it is very common to have random (hash-based) and indexed file organizations with fast, direct access to individual records in the file. However, in batch applications and some online applications, it may be desirable to sequentially search a batch of records in the file. Shneiderman and Goodman [9] have shown the desirability of batched searches in sequential and tree structure organizations. This paper refines and extends the expressions and results reported by Shneiderman and Goodman [9] and later by Batory and Gotlieb [1]. Shneiderman and Goodman developed expressions to show the savings due to batching in sequential and tree organizations. They did not, however, find exact closed-form expressions. Their expressions were complex recursive relations; they did, however, find a closed-form lower-bound estimate for sequential files. Batory and Gotlieb speculated on the form of the expression for the number of node accesses in a sequential search on the basis of the work in [9].A direct approach is taken here. Rather than obtaining savings due to batching, explicit and accurate expressions are derived for the cost of batching. Then, the cost of batching can be compared to the cost of any other type of search. (Note that the above authors [1,9] compare the cost of batched k requests to the cost of k individual searches.) The benefits of these expressions threefold: first, the new equations are exact and closed-form (nonrecursive, noniterative) in the sequential case; second, closed-form equations are easier and simpler to use in any further or related work; and, finally, savings due to batching can be obtained in comparison with any other search technique.Expressions are developed first for the sequential files and then for hierarchically structured files.

show abstract

Approximating block accesses in database organizations

Cited by 316 publications

References 5 publications

Efficiency and effectiveness of query processing in cluster-based retrieval

Efficiency and effectiveness of query processing in cluster-based retrieval

CPU and cache efficient management of memory-resident databases

Expressions for batched searching of sequential and hierarchical files

Contact Info

Product

Resources

About