Clustering Based Under-Sampling for Improving Speaker Verification Decisions Using AdaBoost

Altınçay, Hakan; Ergun, Cem

doi:10.1007/978-3-540-27868-9_76

Cited by 19 publications

(15 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides, Liu et al [27] also presented a weighted rough set method for this problem. However, all of these techniques have some disadvantages [2]. For instance, the computational load is increased and overtraining may occur due to replicated samples in the case of over-sampling.…”

Section: Introductionmentioning

confidence: 99%

“…The first group involves five approaches: (1) under-sampling, a method in which the minority population is kept intact, while the majority population is under-sampled; (2) over-sampling, methods in which the minority examples are over-sampled so that the desired class distribution is obtained in the training set [6,11,19]; (3) cluster based sampling, methods in which the representative examples are randomly sampled from clusters [2]; (4) moving the decision threshold, methods in which the researcher tries to adapt the decision thresholds to impose bias on the minority class [11,21,24] and (5) adjust costs matrices, a method in which the prediction accuracy is improved by adjusting the cost (weight) for each class [15]. Besides, Liu et al [27] also presented a weighted rough set method for this problem.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An information granulation based data mining approach for classifying imbalanced data

Chen

Hsu

et al. 2008

Information Sciences

View full text Add to dashboard Cite

a b s t r a c tRecently, the class imbalance problem has attracted much attention from researchers in the field of data mining. When learning from imbalanced data in which most examples are labeled as one class and only few belong to another class, traditional data mining approaches do not have a good ability to predict the crucial minority instances. Unfortunately, many real world data sets like health examination, inspection, credit fraud detection, spam identification and text mining all are faced with this situation. In this study, we present a novel model called the ''Information Granulation Based Data Mining Approach" to tackle this problem. The proposed methodology, which imitates the human ability to process information, acquires knowledge from Information Granules rather then from numerical data. This method also introduces a Latent Semantic Indexing based feature extraction tool by using Singular Value Decomposition, to dramatically reduce the data dimensions. In addition, several data sets from the UCI Machine Learning Repository are employed to demonstrate the effectiveness of our method. Experimental results show that our method can significantly increase the ability of classifying imbalanced data.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

An information granulation based data mining approach for classifying imbalanced data

Chen

Hsu

et al. 2008

Information Sciences

View full text Add to dashboard Cite

show abstract

“…Altincay and Ergun [2] used this idea for improving speaker verification decisions by using AdaBoost and by using the k-means approach as a clustering algorithm. They claimed that their approach displayed better performance than the random under-sampling method.…”

Section: Related Workmentioning

confidence: 99%

Under-sampling by algorithm with performance guaranteed for class-imbalance problem

Jindaluang¹,

Chouvatut²,

Kantabutra

2014

2014 International Computer Science and Engineering Conference (ICSEC)

View full text Add to dashboard Cite

Class-imbalance problem is the problem that the number, or data, in the majority class is much more than in the minority class. Traditional classifiers cannot sort out this problem because they focus on the data in the majority class than on the data in the minority class, and then they predict some upcoming data as the data in the majority class. Under-sampling is an efficient way to handle this problem because this method selects the representatives of the data in the majority class. For this reason, under-sampling occupies shorter training period than over-sampling. The only problem with the under-sampling method is that a representative selection, in all probability, throws away important information in a majority class. To overcome this problem, we propose a cluster-based undersampling method. We use a clustering algorithm that is performance guaranteed, named k-centers algorithm, which clusters the data in the majority class and selects a number of representative data in many proportions, and then combines them with all the data in the minority class as a training set. In this paper, we compare our approach with k-means on five datasets from UCI with two classifiers: 5-nearest neighbors and c4.5 decision tree. The performance is measured by Precision, Recall, F-measure, and Accuracy. The experimental results show that our approach has higher measurements than the k-means approach, except Precision where both the approaches have the same rate.

show abstract

“…Altmcay et al [18] proposed cluster based synthetic sample creation techniques to under-sample the majority class. They divide the majority class samples into N number of clusters, where N is the number of minority class samples in the dataset.…”

Section: Related Workmentioning

confidence: 99%

Cluster-based majority under-sampling approaches for class imbalance learning

Zhang

Wang

2010

2010 2nd IEEE International Conference on Information and Financial Engineering

View full text Add to dashboard Cite

The class imbalance problem usually occurs in real applications. The class imbalance is that the amount of one class may be much less than that of another in training set. Under-sampling is a very popular approach to deal with this problem. Under-sampling approach is very efficient, it only using a subset of the majority class. The drawback of undersampling is that it throws away many potentially useful majority class examples. To overcome this drawback, we adopt an unsupervised learning technique for supervised learning.We proposes cluster-based majority under-sampling approaches for selecting a representative subset from the majority class. Compared to under-sampling, cluster-based under-sampling can effectively avoid the important information loss of majority class. We adopt two methods to select representative subset from k clusters with certain proportions, and then use the representative subset and the all minority class samples as training data to improve accuracy over minority and majority classes. In the paper, we compared the behaviors of our approaches with the traditional random under-sampling approach on ten UCI repository datasets using the following classifiers: k-nearest neighbor and Naïve Bayes classifier. Recall, Precision, F-measure, G-mean and BACC (balance accuracy) are used for evaluating performance of classifiers. Experimental results show that our cluster-based majority under-sampling approaches outperform the random under-sampling approach. Our approaches attain better overall performance on k-nearest neighbor classifier compared to Naïve Bayes classifier.

show abstract

Clustering Based Under-Sampling for Improving Speaker Verification Decisions Using AdaBoost

Cited by 19 publications

References 7 publications

An information granulation based data mining approach for classifying imbalanced data

An information granulation based data mining approach for classifying imbalanced data

Under-sampling by algorithm with performance guaranteed for class-imbalance problem

Cluster-based majority under-sampling approaches for class imbalance learning

Contact Info

Product

Resources

About