2020
DOI: 10.3390/e22121391
|View full text |Cite
|
Sign up to set email alerts
|

A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning

Abstract: The most common machine-learning methods solve supervised and unsupervised problems based on datasets where the problem’s features belong to a numerical space. However, many problems often include data where numerical and categorical data coexist, which represents a challenge to manage them. To transform categorical data into a numeric form, preprocessing tasks are compulsory. Methods such as one-hot and feature-hashing have been the most widely used encoding approaches at the expense of a significant increase… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 31 publications
(15 citation statements)
references
References 47 publications
0
15
0
Order By: Relevance
“…Broadly speaking, ML systems operate at two processes, namely the learning (used for training) and testing. In order to facilitate the former process, these features commonly form a feature vector that can be binary, numeric, ordinal, or nominal [ 36 ]. This vector is utilized as an input within the learning phase.…”
Section: Introductionmentioning
confidence: 99%
“…Broadly speaking, ML systems operate at two processes, namely the learning (used for training) and testing. In order to facilitate the former process, these features commonly form a feature vector that can be binary, numeric, ordinal, or nominal [ 36 ]. This vector is utilized as an input within the learning phase.…”
Section: Introductionmentioning
confidence: 99%
“…In many statistical fields, but above all in modern machine learning, as far as the availability of data sources increases, methods must be flexible enough to be applied to any sort of data, from numerical to categorical, taking in due account the mixed nature of the data [ 18 ]. One of the main problems when dealing with mixed data is to maintain adequate robustness in the estimators or in the data representation.…”
Section: Discussionmentioning
confidence: 99%
“…The admission diagnosis was also included as patients in the ICU have a diverse set of underlying diagnoses; therefore, such a feature may affect laboratory test results. Categorical variables (sex and admission diagnosis) were coded using an approach that maps categories into numeric data using entropy, as presented in the study by Lopez-Arevalo et al [ 25 ].…”
Section: Methodsmentioning
confidence: 99%