D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data

Yasir, Muhammad; Habib, Muhammad Asif; Ashraf, Muhammad; Sarwar, S.; Chaudhry, Muhammad Umar; Shahwani, Hamayoun; Faisal, C. M. Nadeem

doi:10.1109/access.2020.2971834

Cited by 10 publications

(10 citation statements)

References 74 publications

(58 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…But the execution operation seems bit complex was the major drawback of this approach. Yasir et al 33 demonstrated the Deferring GENEration of Power sets (D‐GENE) that discovers frequent itemsets in sparse big data. Sparseness value, running time, number of transactions, minimum support were the evaluation metrics employed in this approach.…”

Section: Review Of Related Workmentioning

confidence: 99%

Association rule mining based fuzzy manta ray foraging optimization algorithm for frequent itemset generation from social media

Lakshmi¹,

Krishnamurthy²

2021

Concurrency and Computation

View full text Add to dashboard Cite

Nowadays, the concept of data mining is employed widely and created a great deal of attention due to its fast arrival. Numerous approaches to frequent itemsets and association rule mining (ARM) are exemplified in recent years, but still, the performances based on scalability and processing time are considered as a major drawback that results in obtaining the solutions with very poor quality. To overcome such shortcomings, this article proposes three significant phases, namely, the data pre-processing phase, data pre-processing, frequent itemset mining, and ARM. In data pre-processing phase, the collected twitter datasets are pre-processed to eliminate redundant data and convert them into an appropriate format for further mining. In the frequent itemset mining phase, an Apriori algorithm is employed for the exact mining of frequent itemsets. The ARM phase utilizes the fuzzy manta ray foraging (FMRF) optimization algorithm that involves the generation of association rules from the huge itemsets thereby achieving minimum confidence and minimum support value. Here, the recent tweets regarding Covid-19, trump2020, joebiden, draintheswamp, and Godzilla are the datasets collected from the Twitter web link. The experimental analysis and the comparative performances are performed for various simulation measures and the results reveal that the proposed approach provides effective performances when compared with various other existing approaches.

show abstract

Section: Review Of Related Workmentioning

confidence: 99%

Association rule mining based fuzzy manta ray foraging optimization algorithm for frequent itemset generation from social media

Lakshmi¹,

Krishnamurthy²

2021

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…Recently, a new FIM algorithm, Deferring the Generation of Power sets for Mining Frequent Itemsets in Sparse Big data (D-GENE), was proposed [12]. D-GENE uses the concept of power set from set theory to generate an Iterative Trimmed Transaction Lattice (ITTL) of each transaction.…”

Section: Related Workmentioning

confidence: 99%

“…This section covers the working principle and the implementation details of the D-GENE algorithm for mining frequent co-occurring diseases [12]. In the first place, the detailed explanation of the dataset, preprocessing, and transformation steps is provided.…”

Section: Mining Frequent Co-occurring Diseasesmentioning

confidence: 99%

“…D-GENE was chosen due to its superior runtime efficiency as well as having the least memory consumption, especially for sparse data. D-GENE has shown its superiority over other state-of-the-art methods like Apriori and FP-growth [12]. Considering the sparse nature of the data used in this study, D-GENE was believed to be the optimal choice to discover frequent co-occurring diseases from them.…”

mentioning

confidence: 96%

See 1 more Smart Citation

D-GENE-Based Discovery of Frequent Occupational Diseases among Female Home-Based Workers

Yasir

Ashraf²,

Chaudhry

et al. 2021

Electronics

Self Cite

View full text Add to dashboard Cite

A considerable fraction of the female workforce worldwide is making ends meet by doing various jobs informally at home or in nearby places, rather than at employers’ premises. The contribution of these female home-based workers (FHBWs) is significant to the country’s economic growth. FHBWs are often confronted with numerous occupational diseases due to a lack of awareness of occupational safety and health measures, and unhealthy living and working conditions. The informality of FHBWs prevents them from getting proper healthcare, safety, and other dispensations enjoyed by formal employees. Despite their undeniable importance, health issues of FHBWs are still overlooked. This study is an attempt to discover the frequent co-occurring occupational diseases encountered by FHBWs in Punjab, a province of Pakistan. Frequent itemset mining (FIM) or co-occurrence grouping is a technique of data science that identifies the associations among different entities in the data. Based on FIM, the D-GENE algorithm is applied in this study to efficiently discover frequent co-occurring diseases in the data obtained from the Punjab Home-based Workers Survey (2016). The far-reaching goal of the study is to bring awareness of the occupational health issues and safety risks to the health authorities as well as to the FHBWs.

show abstract

“…EAFIM [20] that uses the Apache Spark framework to achieve parallelism is an improved version of the apriori algorithm. Yasir, Muhammad, et al propose the HARPP [21], which adopt the concern of pow set and dictionary data structures, and the D-GENE [22], which suspends the process of ITTL generation till the completion of transaction pruning phase, discovering frequent itemsets from sparse datasets.…”

Section: Related Workmentioning

confidence: 99%

A Fast Approach for Up-Scaling Frequent Itemsets

2020

View full text Add to dashboard Cite

With the rapid growth of data scale and diversification of demand, people have an urgent desire to extract useful frequent itemset from datasets of different scales. It is no doubt that the traditional method can solve the problem. However, the relationships among datasets of different scales are not fully utilized. A fast approach proposed in this paper is as follows: the frequent itemsets on the large-scale data are directly inferred based on the frequent itemsets that are belonged small-scale datasets, instead of mined from the large-scale dataset again on condition that the frequent itemsets on the small-scale datasets have been mined. We conduct extensive experiments on one synthetic data and four UCI data sets. The experimental results show that our algorithm is significantly faster and consumes less memory than these leading algorithms. INDEX TERMS Up-scaling, up-scaling frequent itemsets, frequent itemset mining, data mining.

show abstract

D-GENE: Deferring the GENEration of Power Sets for Discovering Frequent Itemsets in Sparse Big Data

Cited by 10 publications

References 74 publications

Association rule mining based fuzzy manta ray foraging optimization algorithm for frequent itemset generation from social media

Association rule mining based fuzzy manta ray foraging optimization algorithm for frequent itemset generation from social media

D-GENE-Based Discovery of Frequent Occupational Diseases among Female Home-Based Workers

A Fast Approach for Up-Scaling Frequent Itemsets

Contact Info

Product

Resources

About