“…To evaluate the power of the candidate features, it is necessary to calculate the proportion of the revenue increase coalitions according to Eqs. (2) and (1). Theoretically, calculating the Shapley value requires summing over all possible feature subsets, which may lead to high computational complexity.…”
Section: Computational Complexitymentioning
confidence: 99%
“…E NSEMBLE methods are learning algorithms that construct and combine a set of classifiers to classify new unseen data [1]. They tend to use multiple learning algorithms for better predictive performance compared with any other constituent learning algorithms alone [2][3][4][5].…”
The original random forests algorithm has been widely used and has achieved excellent performance for the classification and regression tasks. However, the research on the theory of random forests lags far behind its applications. In this paper, to narrow the gap between the applications and theory of random forests, we propose a new random forests algorithm, called random Shapley forests (RSFs), based on the Shapley value. The Shapley value is one of the well-known solutions in the cooperative game, which can fairly assess the power of each player in a game. In the construction of RSFs, RSFs uses the Shapley value to evaluate the importance of each feature at each tree node by computing the dependency among the possible feature coalitions. In particular, inspired by the existing consistency theory, we have proved the consistency of the proposed random forests algorithm. Moreover, to verify the effectiveness of the proposed algorithm, experiments on eight UCI benchmark datasets and four real-world datasets have been conducted. The results show that RSFs perform better than or at least comparable with the existing consistent random forests, the original random forests and a classic classifier, support vector machines.
“…To evaluate the power of the candidate features, it is necessary to calculate the proportion of the revenue increase coalitions according to Eqs. (2) and (1). Theoretically, calculating the Shapley value requires summing over all possible feature subsets, which may lead to high computational complexity.…”
Section: Computational Complexitymentioning
confidence: 99%
“…E NSEMBLE methods are learning algorithms that construct and combine a set of classifiers to classify new unseen data [1]. They tend to use multiple learning algorithms for better predictive performance compared with any other constituent learning algorithms alone [2][3][4][5].…”
The original random forests algorithm has been widely used and has achieved excellent performance for the classification and regression tasks. However, the research on the theory of random forests lags far behind its applications. In this paper, to narrow the gap between the applications and theory of random forests, we propose a new random forests algorithm, called random Shapley forests (RSFs), based on the Shapley value. The Shapley value is one of the well-known solutions in the cooperative game, which can fairly assess the power of each player in a game. In the construction of RSFs, RSFs uses the Shapley value to evaluate the importance of each feature at each tree node by computing the dependency among the possible feature coalitions. In particular, inspired by the existing consistency theory, we have proved the consistency of the proposed random forests algorithm. Moreover, to verify the effectiveness of the proposed algorithm, experiments on eight UCI benchmark datasets and four real-world datasets have been conducted. The results show that RSFs perform better than or at least comparable with the existing consistent random forests, the original random forests and a classic classifier, support vector machines.
“…Kelebihan dari penggunaan teknik bagging adalah dapat mengurangi varians dari algoritma dengan penyesuaian antara estimasi dan hasil yang diinginkan dari peningkatan akurasi suatu model [12]. Prediksi dari out-of-bag menunjukan H(X) pada vektor X. Learner yang tidak dilatih X yang akan terlibat pada prosesnya [13]. Adapun rumusnya adalah sebagai berikut: dimana X adalah vector, x adalah variabel, y adalah output spaces, N adalah data sample, T adalah jumlah learner (t=1,....,T), H adalah learner dan (.)…”
Abstract—The rapid growth of online shopping sites makes business in the virtual world very promising. Purchasing intentions is one of the keys to success in an online store. There are several data mining methods for making predictions on online purchase intentions datasets. Data can represent the characteristics or habits of each user who has visited a site whether it ends with a transaction or not. Some popular algorithms with good performance in data mining include J48 and Logistic Regression. However, in data sometimes there is a problem of class imbalance, so the ensemble technique needs to be applied. One technique that can be applied is bagging. This research examines data using bagging techniques to improve the performance of the J48 algorithm and Logistic Regression. The results of improving the performance of data mining algorithms with these techniques have an accuracy value of 89.68% for the J48 algorithm and 88.50% for the Logistic Regression algorithm. This figure shows an increase when compared with initial testing without using ensemble techniques. Increases were also experienced in Recall, F-Measure, and AUC values.
Keywords—purchasing intentions; J48; Logistic Regression; Bagging;
Abstrak— Pesatnya situs pembelanjaan online menjadikan bisnis di dunia virtual sangat menjanjikan. Minat pembelian menjadi salah satu kunci kesuksesan pada sebuah toko online. Terdapat beberapa metode data mining untuk melakukan prediksi pada dataset minat pembelian online. Data dapat mewakili karakteristik atau kebiasaan dari setiap user yang telah mengunjungi suatu situs baik berakhir dengan melakukan transaksi ataupun tidak. Beberapa algoritma yang populer dengan kinerja yang baik dalam data mining diantaranya J48 dan Logistic Regreession. Namun, dalam sebuah data terkadang terdapat masalah ketidakseimbangan kelas, sehingga perlu diterapkan teknik ensemble. Salah satu teknik yang dapat diterapkan adalah teknik bagging. Penelitian kali ini mengujikan data dengan teknik bagging untuk meningkatkan kinerja algoritma J48 dan Logistic Regression. Hasil dari peningkatan kinerja algoritma data mining dengan teknik tersebut memiliki nilai akurasi 89.68% untuk algoritma J48 dan 88.50% untuk algoritma Logistic Regression. Angka tersebut menunjukan adanya peningkatan jika dibandingkan dengan pengujian awal tanpa menggunakan teknik ensemble. Peningkatan juga dialami pada nilai Recall, F-Measure, dan AUC.
Keywords—Minat Pembelian, J48, Logistic Regression, Bagging
“…Secara umum algoritma AdaBoost melatih pengklasifikasian dasar secara sekuensial dalam setiap iterasi menggunakan data latih dengan koefisien bobot yang bergantung dari performa pengklasifikasian pada iterasi sebelumnya untuk memberikan bobot yang lebih besar pada data yang salah terklasifikasi (Schapire & Freund, 2013), (Schwenker, 2013).…”
Jumlah data yang sangat banyak pada industri perbankan sangat susah bahkan mustahil untuk dianalisis secara manual untuk mendapatkan suatu informasi yang berguna untuk menentukan suatu kebijakan. Oleh karena itu, penggunaan data mining diharapkan dapat memberikan kontribusi dalam mengolah data tersebut. Berbagai metode telah banyak digunakan untuk mengklasifikasi suatu data, salah satunya adalah metode support vector machine. Penelitian ini bertujuan untuk melakukan klasifikasi terhadap nasabah yang berpotensi berlangganan deposito pada bank marketing dataset. Fokus penelitian ini mengusulkan pengembangan dari metode support vector machine yaitu metode least square support vector machine kemudian di ensemble menggunakan boosting. Data yang akan diolah adalah bank marketing dataset. Hasil menunjukkan bahwa metode yang diusulkan yakni ensemble least square support vector machine lebih baik dibandingkan dengan metode lainnya dengan persentase tingkat accuracy, sensitivity, specivicity masing-masing adalah 95.15%, 92.93%, 97.61% dengan total rata-rata hasil klasifikasi sebesar 95.23%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.