A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning

López-Arévalo, Iván; Aldana-Bobadilla, Edwin; Molina-Villegas, Alejandro; Galeana-Zapién, Hiram; Muñiz-Sánchez, Víctor; Gausin-Valle, Saul

doi:10.3390/e22121391

Cited by 31 publications

(15 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Broadly speaking, ML systems operate at two processes, namely the learning (used for training) and testing. In order to facilitate the former process, these features commonly form a feature vector that can be binary, numeric, ordinal, or nominal [ 36 ]. This vector is utilized as an input within the learning phase.…”

Section: Introductionmentioning

confidence: 99%

Machine Learning in Agriculture: A Comprehensive Updated Review

Benos

Tagarakis

Dolias

et al. 2021

Sensors

379

146

View full text Add to dashboard Cite

The digital transformation of agriculture has evolved various aspects of management into artificial intelligent systems for the sake of making value from the ever-increasing data originated from numerous sources. A subset of artificial intelligence, namely machine learning, has a considerable potential to handle numerous challenges in the establishment of knowledge-based farming systems. The present study aims at shedding light on machine learning in agriculture by thoroughly reviewing the recent scholarly literature based on keywords’ combinations of “machine learning” along with “crop management”, “water management”, “soil management”, and “livestock management”, and in accordance with PRISMA guidelines. Only journal papers were considered eligible that were published within 2018–2020. The results indicated that this topic pertains to different disciplines that favour convergence research at the international level. Furthermore, crop management was observed to be at the centre of attention. A plethora of machine learning algorithms were used, with those belonging to Artificial Neural Networks being more efficient. In addition, maize and wheat as well as cattle and sheep were the most investigated crops and animals, respectively. Finally, a variety of sensors, attached on satellites and unmanned ground and aerial vehicles, have been utilized as a means of getting reliable input data for the data analyses. It is anticipated that this study will constitute a beneficial guide to all stakeholders towards enhancing awareness of the potential advantages of using machine learning in agriculture and contributing to a more systematic research on this topic.

show abstract

Section: Introductionmentioning

confidence: 99%

Machine Learning in Agriculture: A Comprehensive Updated Review

Benos

Tagarakis

Dolias

et al. 2021

Sensors

379

146

View full text Add to dashboard Cite

show abstract

“…In many statistical fields, but above all in modern machine learning, as far as the availability of data sources increases, methods must be flexible enough to be applied to any sort of data, from numerical to categorical, taking in due account the mixed nature of the data [ 18 ]. One of the main problems when dealing with mixed data is to maintain adequate robustness in the estimators or in the data representation.…”

Section: Discussionmentioning

confidence: 99%

Dynamic Mixed Data Analysis and Visualization

Grané

Manzi

Salini

2022

Entropy

View full text Add to dashboard Cite

One of the consequences of the big data revolution is that data are more heterogeneous than ever. A new challenge appears when mixed-type data sets evolve over time and we are interested in the comparison among individuals. In this work, we propose a new protocol that integrates robust distances and visualization techniques for dynamic mixed data. In particular, given a time t∈T={1,2,…,N}, we start by measuring the proximity of n individuals in heterogeneous data by means of a robustified version of Gower’s metric (proposed by the authors in a previous work) yielding to a collection of distance matrices {D(t),∀t∈T}. To monitor the evolution of distances and outlier detection over time, we propose several graphical tools: First, we track the evolution of pairwise distances via line graphs; second, a dynamic box plot is obtained to identify individuals which showed minimum or maximum disparities; third, to visualize individuals that are systematically far from the others and detect potential outliers, we use the proximity plots, which are line graphs based on a proximity function computed on {D(t),∀t∈T}; fourth, the evolution of the inter-distances between individuals is analyzed via dynamic multiple multidimensional scaling maps. These visualization tools were implemented in the Shinny application in R, and the methodology is illustrated on a real data set related to COVID-19 healthcare, policy and restriction measures about the 2020–2021 COVID-19 pandemic across EU Member States.

show abstract

“…The admission diagnosis was also included as patients in the ICU have a diverse set of underlying diagnoses; therefore, such a feature may affect laboratory test results. Categorical variables (sex and admission diagnosis) were coded using an approach that maps categories into numeric data using entropy, as presented in the study by Lopez-Arevalo et al [ 25 ].…”

Section: Methodsmentioning

confidence: 99%

Predicting Abnormal Laboratory Blood Test Results in the Intensive Care Unit Using Novel Features Based on Information Theory and Historical Conditional Probability: Observational Study

Valderrama¹,

Niven²,

Stelfox³

et al. 2022

JMIR Med Inform

View full text Add to dashboard Cite

Background Redundancy in laboratory blood tests is common in intensive care units (ICUs), affecting patients’ health and increasing health care expenses. Medical communities have made recommendations to order laboratory tests more judiciously. Wise selection can rely on modern data-driven approaches that have been shown to help identify low-yield laboratory blood tests in ICUs. However, although conditional entropy and conditional probability distribution have shown the potential to measure the uncertainty of yielding an abnormal test, no previous studies have adapted these techniques to include them in machine learning models for predicting abnormal laboratory test results. Objective This study aimed to address the limitations of previous reports by adapting conditional entropy and conditional probability to extract features for predicting abnormal laboratory blood test results. Methods We used an ICU data set collected across Alberta, Canada, which included 55,689 ICU admissions from 48,672 patients. We investigated the features of conditional entropy and conditional probability by comparing the performances of 2 machine learning approaches for predicting normal and abnormal results for 18 blood laboratory tests. Approach 1 used patients’ vitals, age, sex, and admission diagnosis as features. Approach 2 used the same features plus the new conditional entropy–based and conditional probability–based features. Both approaches used 4 different machine learning models (fuzzy model, logistic regression, random forest, and gradient boosting trees) and 10 metrics (sensitivity, specificity, accuracy, precision, negative predictive value [NPV], F1 score, area under the curve [AUC], precision-recall AUC, mean G, and index balanced accuracy) to assess the performance of the approaches. Results Approach 1 achieved an average AUC of 0.86 for all 18 laboratory tests across the 4 models (sensitivity 78%, specificity 84%, precision 82%, NPV 75%, F1 score 79%, and mean G 81%), whereas approach 2 achieved an average AUC of 0.89 (sensitivity 84%, specificity 84%, precision 83%, NPV 81%, F1 score 83%, and mean G 84%). We found that the inclusion of the new features resulted in significant differences for most of the metrics in favor of approach 2. Sensitivity significantly improved for 8 and 15 laboratory tests across the different classifiers (minimum P<.001 and maximum P=.04). Mean G and index balanced accuracy, which are balanced performance metrics, also improved significantly across the classifiers for 6 to 10 and 6 to 11 laboratory tests. The most relevant feature was the pretest probability feature, which is the probability that a test result was normal when a certain number of consecutive prior tests was already normal. Conclusions The findings suggest that conditional entropy–based features and pretest probability improve the capacity to discriminate between normal and abnormal laboratory test results. Detecting the next laboratory test result is an intermediate step toward developing guidelines for reducing overtesting in the ICU.

show abstract

A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning

Cited by 31 publications

References 47 publications

Machine Learning in Agriculture: A Comprehensive Updated Review

Machine Learning in Agriculture: A Comprehensive Updated Review

Dynamic Mixed Data Analysis and Visualization

Predicting Abnormal Laboratory Blood Test Results in the Intensive Care Unit Using Novel Features Based on Information Theory and Historical Conditional Probability: Observational Study

Contact Info

Product

Resources

About