Data Stream Mining algorithms performs under constraints called space used and time taken, which is due to the streaming property. The relaxation in these constraints is inversely proportional to the streaming speed of the data. Since the caching and mining the streaming-data is sensitive, here in this paper a scalable, memory efficient caching and frequent itemset mining model is devised. The proposed model is an incremental approach that builds single level multi node trees called bushes from each window of the streaming data; henceforth we refer this proposed algorithm as a Tree (bush) based Incremental Frequent Itemset Mining (TIFIM) over data streams.
Mining data streams has recently become an important active research work and more widespread in several fields of computer science and engineering. It has proven successfully in many domains such as wireless sensor networks, ATM transactions, search engines, web analysis and weather monitoring. Data steams can be considered a subfield of machine learning, data mining and knowledge discovery. Data Mining is a step in the process of knowledge discovery from large amount of data. Traditional data mining techniques can not be easily applied to the data stream mining due to unique characteristics of data streams. In this research work, we will survey the main techniques and applications of data mining and data stream mining. We then study, the computational and miming challenges in particular, on-line mining of continuous, high-speed massive data streams.
Continuous prediction of closed frequent itemsets from high speed distributed data streams is an active research work, which is because of the conflict to the process time taken to perform mining consistent itemsets from current records and high alacrity transmission time in data streams. By the motivation gained from our earlier proposed models, here we devised a novel closed frequent itemset mining model for high speed distributed data streams. The said model is referred as Parallel Closed Frequent Itemsets Mining (PCFIM) over High Speed Distributed Data streams by Manifold Varying Size Windows (MVSW). The results obtained from experiments are significant to prove that the proposed PCFIM is scalable and robust on high speed data streams and miles ahead over existing bench mark models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.