Vous avez des questions? Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n'arrivez pas à les repérer, communiquez avec nous à PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca.
Questions? Contact the NRC Publications Archive team atPublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca. If you wish to email the authors directly, please see the first page of the publication for their contact information.
NRC Publications Archive Archives des publications du CNRCThis publication could be one of several versions: author's original, accepted manuscript or the publisher's version. / La version de cette publication peut être l'une des suivantes : la version prépublication de l'auteur, la version acceptée du manuscrit ou la version de l'éditeur. For the publisher's version, please access the DOI link below./ Pour consulter la version de l'éditeur, utilisez le lien DOI ci-dessous.http://doi.org/10.5220/0005595502260234Access and use of this website and the material on it are subject to the Terms and Conditions set forth at SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling Agrawal, Astha; Viktor, Herna L.; Paquet, Eric http://nparc.cisti-icist.nrc-cnrc.gc.ca/fra/droits L'accès à ce site Web et l'utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D'UTILISER CE SITE WEB.
NRC Publications Record / Notice d'Archives des publications de CNRC:http://nparc.cisti-icist.nrc-cnrc.gc.ca/eng/view/object/?id=e8c7556d-9f94-466f-a1e5-72cdf9b9513f http://nparc.cisti-icist.nrc-cnrc.gc.ca/fra/voir/objet/?id=e8c7556d-9f94-466f-a1e5-72cdf9b9513f Abstract: Class imbalance is a crucial problem in machine learning and occurs in many domains. Specifically, the two-class problem has received interest from researchers in recent years, leading to solutions for oil spill detection, tumour discovery and fraudulent credit card detection, amongst others. However, handling class imbalance in datasets that contains multiple classes, with varying degree of imbalance, has received limited attention. In such a multi-class imbalanced dataset, the classification model tends to favour the majority classes and incorrectly classify instances from the minority classes as belonging to the majority classes, leading to poor predictive accuracies. Further, there is a need to handle both the imbalances between classes as well as address the selection of examples within a class (i.e. the so-called within class imbalance). In this paper, we propose the SCUT hybrid sampling method, which is used to balance the number of training examples in such a multi-class setting. Our SCUT approach oversamples minority class examples through the generation of synthetic examples and employs cluster analysis in order to undersample majority classes. In addition, it handles both within-class and between-class imbalance. Our exp...