Scaling associative classification for very large datasets

Venturini, Luca; Baralis, Elena; Garza, Paolo

doi:10.1186/s40537-017-0107-2

Cited by 13 publications

(13 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where Supp(X =>Y) denotes the support of pattern <if X then Y > while P(X ∪ Y) represents the probability of occurrence of itemset X with class label Y (Hadi, Al-Radaideh & Alhawari, 2018;Nguyen et al, 2018). Definition 7 Confidence of a pattern (X => Y) (Venturini, Baralis & Garza, 2018;Hadi, Al-Radaideh & Alhawari, 2018) is calculated as:…”

Section: Basic Terms Of Associative Classificationmentioning

confidence: 99%

Semi-supervised associative classification using ant colony optimization algorithm

Awan¹,

Shahzad²

2021

PeerJ Computer Science

View full text Add to dashboard Cite

Labeled data is the main ingredient for classification tasks. Labeled data is not always available and free. Semi-supervised learning solves the problem of labeling the unlabeled instances through heuristics. Self-training is one of the most widely-used comprehensible approaches for labeling data. Traditional self-training approaches tend to show low classification accuracy when the majority of the data is unlabeled. A novel approach named Self-Training using Associative Classification using Ant Colony Optimization (ST-AC-ACO) has been proposed in this article to label and classify the unlabeled data instances to improve self-training classification accuracy by exploiting the association among attribute values (terms) and between a set of terms and class labels of the labeled instances. Ant Colony Optimization (ACO) has been employed to construct associative classification rules based on labeled and pseudo-labeled instances. Experiments demonstrate the superiority of the proposed associative self-training approach to its competing traditional self-training approaches.

show abstract

Section: Basic Terms Of Associative Classificationmentioning

confidence: 99%

Semi-supervised associative classification using ant colony optimization algorithm

Awan¹,

Shahzad²

2021

PeerJ Computer Science

View full text Add to dashboard Cite

show abstract

“…• DAC [32]. Ensemble learning which distributes the training of an associative classifier among parallel workers.…”

Section: Big Data Algorithmsmentioning

confidence: 99%

Evaluating associative classification algorithms for Big Data

2019

View full text Add to dashboard Cite

Background: Associative Classification, a combination of two important and different fields (classification and association rule mining), aims at building accurate and interpretable classifiers by means of association rules. A major problem in this field is that existing proposals do not scale well when Big Data are considered. In this regard, the aim of this work is to propose adaptations of well-known associative classification algorithms (CBA and CPAR) by considering different Big Data platforms (Spark and Flink). Results: An experimental study has been performed on 40 datasets (30 classical datasets and 10 Big Data datasets). Classical data have been used to find which algorithms perform better sequentially. Big Data dataset have been used to prove the scalability of Big Data proposals. Results have been analyzed by means of non-parametric tests. Results proved that CBA-Spark and CBA-Flink obtained interpretable classifiers but it was more time consuming than CPAR-Spark or CPAR-Flink. In this study, it was demonstrated that the proposals were able to run on Big Data (file sizes up to 200 GBytes). The analysis of different quality metrics revealed that no statistical difference can be found for these two approaches. Finally, three different metrics (speed-up, scale-up and size-up) have also been analyzed to demonstrate that the proposals scale really well on Big Data. Conclusions: The experimental study has revealed that sequential algorithms cannot be used on large quantities of data and approaches such as CBA-Spark, CBA-Flink, CPAR-Spark or CPAR-Flink are required. CBA has proved to be very useful when the main goal is to obtain highly interpretable results. However, when the runtime has to be minimized CPAR should be used. No statistical difference could be found between the two proposals in terms of quality of the results except for the interpretability of the final classifiers, CBA being statistically better than CPAR.

show abstract

“…DM contains a rich set of classification models; specifically, Support Vector Machine [9], Rule Based [10], Decision Tree [11,12], Bayesian classification [2], k -Nearest Neighbor [13], and AC [14]. Among all, AC is relatively new and promising [15,16,17,18,19,20,21,22,23,24] as it combines the best approaches of association rules mining (ARM) and classification. AC is based on ARM where, first, the strongest Class Association Rules (CAR) are discovered from dataset, followed by converting those rules into classifier model.…”

Section: Introductionmentioning

confidence: 99%

“…After the introduction of AC in 1997, numbers of algorithms are developed in this family e.g. CBA [14,25], CMAR [18], CPAR [21], MCAR [19], MAC [26], CMARAA [27], MRAC & MRAC+ [15], DAC [23], CBA-Spark and CPAR-Spark [24] and G3P-ACBD [28]. Almost all consist of three basic steps -a) Association rule generation, b) Classifier building -rule pruning and rule ranking c) Classification of unknown records using the classifier.…”

Section: Introductionmentioning

confidence: 99%

Associative Classification using Automata with Structure based Merging

Abrar¹,

Sim²,

Abbas³

2019

IJACSA

View full text Add to dashboard Cite

Associative Classification, a combination of two important and different fields (classification and association rule mining), aims at building accurate and interpretable classifiers by means of association rules. The process used to generate association rules is exponential by nature; thus in AC, researchers focused on the reduction of redundant rules via rules pruning and rules ranking techniques. These techniques take an important part in improving the efficiency; however, pruning may negatively affect the accuracy by pruning interesting rules. Further, these techniques are time consuming in term of processing and also require domain specific knowledge to decide upon the selection of the best ranking and pruning strategy. In order to overcome these limitations, in this research, an automata based solution is proposed to improve the classifier's accuracy while replacing ranking and pruning. A new merging concept is introduced which used structure based similarity to merge the association rules. The merging not only help to reduce the classifier size but also minimize the loss of information by avoiding the pruning. The extensive experiments showed that the proposed algorithm is efficient than AC, Naive Bayesian, and Rule and Tree based classifiers in term of accuracy, space, and speed. The merging takes the advantages of the repetition in the rules set and keep the classifier as small as possible.

show abstract

Scaling associative classification for very large datasets

Cited by 13 publications

References 22 publications

Semi-supervised associative classification using ant colony optimization algorithm

Semi-supervised associative classification using ant colony optimization algorithm

Evaluating associative classification algorithms for Big Data

Associative Classification using Automata with Structure based Merging

Contact Info

Product

Resources

About