Stable periodic-frequent itemset mining is essential in big data analytics with many real-world applications. It involves extracting all itemsets exhibiting stable periodic behaviors in a temporal database. Most previous studies focused on finding these itemsets in row (temporal) databases and disregarded the occurrences of these itemsets in columnar databases. Furthermore, the naïve approach of transforming a columnar database into a row database and then applying the existing algorithms to find interesting itemsets is not practicable due to computational reasons. With this motivation, this paper proposes a framework to discover stable periodic-frequent itemsets in columnar databases. Our framework employs a novel depth-first search algorithm that compresses a given columnar database into a unified dictionary and mines it recursively to find all stable periodic-frequent itemsets. The dictionary holds the information pertaining to itemsets and their temporal occurrences in a database. Experimental results on six databases demonstrate that the proposed algorithm is computationally efficient and scalable.INDEX TERMS Columnar databases, stable periodic-frequent itemset, itemset mining.
I. INTRODUCTIONDatabase systems play a crucial role in storing the big data generated by real-world applications. Depending on the layout used for storing the data, one can broadly classify the databases into two types: row databases and columnar databases. 1 Row databases are primarily based on ACID 2 properties and organize the data as records by keeping the data associated with a record next to each other in a storage device. The popular row databases include MySQL [1] and Postgres [2]. In contrast, columnar databases are based on BASE 3 properties and organize data into fields and storeThe associate editor coordinating the review of this manuscript and approving it for publication was Laura Celentano .1 Columnar and row databases are referred as vertical and horizontal databases, respectively.2 ACID is an acronym for Atomicity, Consistency, Isolation, and Duration.3 BASE is an acronym for Basically Available, Soft state, and Eventually consistent.