Scalable CAIM discretization on multiple GPUs using concurrent kernels

Cano, Alberto; Ventura, Sebastián; Cios, Krzysztof J.

doi:10.1007/s11227-014-1151-8

Cited by 7 publications

(3 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some works have tried to deal with large‐scale discretization. For example, in Ref , the authors proposed a scalable implementation of Class‐Attribute Interdependence Maximization algorithm by using GPU technology. In Ref , a discretizer based on windowing and hierarchical clustering is proposed to improve the performance of classical tree‐based classifiers.…”

Section: Taxonomymentioning

confidence: 99%

Data discretization: taxonomy and big data challenge

Ramírez-Gallego

García

Mourino-Talin

et al. 2015

WIREs Data Min & Knowl

116

View full text Add to dashboard Cite

Discretization of numerical data is one of the most influential data preprocessing tasks in knowledge discovery and data mining. The purpose of attribute discretization is to find concise data representations as categories which are adequate for the learning task retaining as much information in the original continuous attribute as possible. In this article, we present an updated overview of discretization techniques in conjunction with a complete taxonomy of the leading discretizers. Despite the great impact of discretization as data preprocessing technique, few elementary approaches have been developed in the literature for Big Data. The purpose of this article is twofold: a comprehensive taxonomy of discretization techniques to help the practitioners in the use of the algorithms is presented; the article aims is to demonstrate that standard discretization methods can be parallelized in Big Data platforms such as Apache Spark, boosting both performance and accuracy. We thus propose a distributed implementation of one of the most well‐known discretizers based on Information Theory, obtaining better results than the one produced by: the entropy minimization discretizer proposed by Fayyad and Irani. Our scheme goes beyond a simple parallelization and it is intended to be the first to face the Big Data challenge. WIREs Data Mining Knowl Discov 2016, 6:5–21. doi: 10.1002/widm.1173 This article is categorized under: Technologies > Classification Technologies > Data Preprocessing

show abstract

Section: Taxonomymentioning

confidence: 99%

Data discretization: taxonomy and big data challenge

Ramírez-Gallego

García

Mourino-Talin

et al. 2015

WIREs Data Min & Knowl

116

View full text Add to dashboard Cite

show abstract

“…They offer higher scalability to big data problems for a fraction of the cost of a traditional mainframe solution. GPUs are particularly efficient for streaming environments and provide a very fast decision with minimum label latency [22][23][24][25][26][27]. However, they are often associated with a more difficult code implementation and limited memory, which makes it difficult to scale to true big data problems.…”

Section: Data Stream Mining For Online Learningmentioning

confidence: 99%

Introductory Chapter: Data Streams and Online Learning in Social Media

Cano¹

2020

Social Media and Machine Learning

Self Cite

View full text Add to dashboard Cite

“…Therefore, it can easily scale to large problems. Moreover, discretization of multiple attributes can be parallelized using CPU threads or GPUs [7]. Table 13 shows the discretization time for the datasets.…”

Section: Space and Time Complexitymentioning

confidence: 99%

LAIM discretization for multi-label data

Cano

Luna

Gibaja

et al. 2016

Information Sciences

Self Cite

View full text Add to dashboard Cite

Multi-label learning is a challenging task in data mining which has attracted growing attention in recent years. Despite the fact that many multi-label datasets have continuous features, general algorithms developed specially to transform multi-label datasets with continuous attributes' values into a finite number of intervals have not been proposed to date. Many classification algorithms require discrete values as the input and studies have shown that supervised discretization may improve classification performance. This paper presents a Label-Attribute Interdependence Maximization (LAIM) discretization method for multi-label data. LAIM is inspired in the discretization heuristic of CAIM for single-label classification. The maximization of the label-attribute interdependence is expected to improve labels prediction in data separated through disjoint intervals. The main aim of this paper is to present a discretization method specifically designed to deal with multi-label data and to analyze whether this can improve the performance of multi-label learning methods. To this end, the experimental analysis evaluates the performance of 12 multi-label learning algorithms (transformation, adaptation, and ensemble-based) on a series of 16 multi-label datasets with and without supervised and unsupervised discretization, showing that LAIM discretization improves the performance for many algorithms and measures.

show abstract

Scalable CAIM discretization on multiple GPUs using concurrent kernels

Cited by 7 publications

References 39 publications

Data discretization: taxonomy and big data challenge

Data discretization: taxonomy and big data challenge

Introductory Chapter: Data Streams and Online Learning in Social Media

LAIM discretization for multi-label data

Contact Info

Product

Resources

About