We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an e cient algorithm that generates all signi cant association rules between items in the database. The algorithm incorporates bu er management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the e ectiveness of the algorithm.
Organizing massive amount of data on wireless communication networks in order to provide fast and low power access to users equipped with palmtops, is a new challenge to the data management and telecommunication communities. Solutions must take under consideration the physical restrictions of low network bandwidth and limited battery life of palmtops. This paper proposes algorithms for multiplexing clustering and nonclustering indexes along with data on wireless networks. The power consumption and the latency for obtaining the required data are considered as the two basic performance criteria for all algorithms. First, this paper describes two algorithms namely, (1, m) Indexing and Distributed Indexing, for multiplexing data and its clustering index. Second, an algorithm called Nonclustered Indexing is described for allocating static data and its corresponding nonclustered index. Then, the Nonclustered indexing algorithm is generalized to the case of multiple indexes. Finally, the proposed algorithms are analytically demonstrated to lead to significant improvement of battery life while retaining a low latency.
We present our perspective of database mining as the con uence of machine learning techniques and the performance emphasis of database technology. W e describe three classes of database mining problems involving classi cation, associations, and sequences, and argue that these problems can be uniformly viewed as requiring discovery of rules embedded in massive data. We describe a model and some basic operations for the process of rule discovery. W e show h o w the database mining problems we consider map to this model and how they can be solved by using the basic operations we propose. We give an example of an algorithm for classi cation obtained by combining the basic rule discovery operations. This algorithm not only is e cient in discovering classi cation rules but also has accuracy comparable to ID3, one of the current best classi ers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.