Sparseness is the distinctive aspect of big data generated by numerous applications at present. Furthermore, several similar records exist in real-world sparse datasets. Based on Iterative Trimmed Transaction Lattice (ITTL), the recently proposed TRICE algorithm learns frequent itemsets efficiently from sparse datasets. TRICE stores alike transactions once, and eliminates the infrequent part of each distinct transaction afterward. However, removing the infrequent part of two or more distinct transactions may result in similar trimmed transactions. TRICE repeatedly generates ITTLs of similar trimmed transactions that induce redundant computations and eventually, affects the runtime efficiency. This paper presents D-GENE, a technique that optimizes TRICE by introducing a deferred ITTL generation mechanism. D-GENE suspends the process of ITTL generation till the completion of transaction pruning phase. The deferral strategy enables D-GENE to generate ITTLs of similar trimmed transactions once. Experimental results show that by avoiding the redundant computations, D-GENE gets better runtime efficiency. D-GENE beats TRICE, FP-growth, and optimized versions of SaM and RElim algorithms comprehensively, especially when the difference between distinct transactions and distinct trimmed transactions is high.INDEX TERMS Big data applications, pattern recognition, association rules, frequent item set mining, IoT.
Sparseness is often witnessed in big data emanating from a variety of sources, including IoT, pervasive computing, and behavioral data. Frequent itemset mining is the first and foremost step of association rule mining, which is a distinguished unsupervised machine learning problem. However, techniques for frequent itemset mining are least explored for sparse real-world data, showing somewhat comparable performance. On the contrary, the methods are adequately validated for dense data and stand apart from each other in terms of performance. Hence, there arises an immense need for evaluating these techniques as well as proposing new ones for large sparse real-world datasets. In this study, a novel method: Mining Frequent Itemsets by Iterative TRimmed Transaction lattICE (TRICE) is proposed. TRICE iteratively generates combinations of varying-sized trimmed subsets of I , where I denote the set of distinct items in a database. Extensive experiments are conducted to assess TRICE against HARPP, FP-Growth, optimized SaM, and optimized RElim algorithms. The experimental results show that TRICE outperforms all these algorithms both in terms of running time and memory consumption. TRICE maintains a substantial performance gap for all sparse real-world datasets on all minimum support thresholds. Moreover, assessment of memory use of optimized SaM and RElim algorithms has been performed for the first time.INDEX TERMS Association rules, big data applications, data mining, frequent itemset mining, pattern recognition, pervasive computing.
Researchers from the University of Washington and Pakistan are using 21st century technology to revive farming as a profitable profession in the Indus Valley.
Vehicular Named Data Network (VNDN) is considered a strong paradigm to deploy in vehicular applications. In VNDN, each node has its cache, but due to limited cache, it directly affects the performance in a highly dynamic environment, which requires massive and fast content delivery. To reduce these issues, the cooperative caching plays an efficient role in VNDN. Most studies regarding cooperative caching focus on content replacement and caching algorithms and implement these methods in a static environment rather than a dynamic environment. In addition, few existing approaches addressed the cache diversity and latency in VNDN. This paper proposes a Dynamic Cooperative Cache Management Scheme (DCCMS) based on social and popular data, which improves the cache efficiency and implements it in a dynamic environment. We designed a two-level dynamic caching scheme, in which we choose the right caching node that frequently communicates with other nodes, keep the copy of the most popular content, and distribute it with the requester’s node when needed. The main intention of DCCMS is to improve the cache performance in terms of reducing latency, server load, cache hit ratio, average hop count, cache utilization, and diversity. The simulation results show that our proposed DCCMS scheme improves the cache performance than other state-of-the-art approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.