Applied Predictive Modeling

Kühn, Max; Johnson, Kjell

doi:10.1007/978-1-4614-6849-3

Cited by 3,643 publications

(3,469 citation statements)

References 0 publications

Supporting

Mentioning

3,334

Contrasting

Unclassified

122

Order By: Relevance

“…For each study area, random forests (Liaw and Wiener, 2002; parameters mtry = default and 299 ntree = 1000) was used to calculate covariate importance, as random forests is not highly sensitive to 300 non-informative predictors (Kuhn and Johnson, 2013). Random forests identifies important covariates 301 by generating multiple classification trees (a forest) using bootstrap sampling, randomly scrambling the 302 covariates in each bootstrap sample and reclassifying the bootstrap sample.…”

Section: Utah Clhs 207 208mentioning

confidence: 99%

Machine learning for predicting soil classes in three semi-arid landscapes

et al. 2015

View full text Add to dashboard Cite

any, machine learning model and covariate set might be optimal for predicting soil classes across 23 different landscapes. 24Our objective was to compare multiple machine learning models and covariate sets for predicting soil 25 taxonomic classes at three geographically distinct areas in the semi-arid western United States of 26 America (southern New Mexico, southwestern Utah, and northeastern Wyoming). All three areas were 27 the focus of digital soil mapping studies. Sampling sites at each study area were selected using 28 conditioned Latin hypercube sampling (cLHS). We compared models that had been used in other DSM 29 studies, including clustering algorithms, discriminant analysis, multinomial logistic regression, neural 30 networks, tree based methods, and support vector machine classifiers. Tested machine learning models 31 were divided into three groups based on model complexity: simple, moderate, and complex. We also 32

show abstract

Section: Utah Clhs 207 208mentioning

confidence: 99%

Machine learning for predicting soil classes in three semi-arid landscapes

et al. 2015

View full text Add to dashboard Cite

show abstract

“…The predictive power in the data may depend significantly on the way missing values are treated. While some machine learning algorithms, such as decision trees [16], have the capability to handle missing data outright, most machine learning algorithms do not. In many situations missing values are imputed using a supervised learning technique such as k-Nearest Neighbour (KNN) after suitable scaling to balance the contribution of the numeric attributes.…”

Section: Imputationmentioning

confidence: 99%

“…These imputation techniques do not have theoretical formulations but have been much implemented in practice [4] [6]. In this work, we considered different imputations such as the KNN imputation, the tree bagging imputation from the caret package [16], and the random forest imputation from the randomForest package [17]. The last method led to the best results in terms of the performance of the predictive models finally built, although it was more computationally expensive.…”

Section: Imputationmentioning

confidence: 99%

“…the number of hidden units) and 10 values of the decay (i.e. the weight decay), which is the parameter in the penalization method for model regularization to avoid overfitting, similar to the penalization method in ridge regression, based on the L2 norm [16]. The optimal values were 3 and 0.01, respectively.…”

Section: Training and Optimizing (Tuning) Predictive Modelsmentioning

confidence: 99%

“…When there is a priori knowledge of a class imbalance, one direct method to reduce its influence on model training is to select training set samples to have roughly equal event rates [16]. Treating data imbalance usually leads to better predictions models and better trade-off between sensitivity and specificity.…”

Section: Sampling and Post-processing K-fold Cross-testingmentioning

confidence: 99%

See 2 more Smart Citations

Predicting First-Episode Psychosis Associated with Cannabis Use with Artificial Neural Networks and Deep Learning

Stamate

Alghamdi

Ståhl

et al. 2018

Communications in Computer and Information Science

View full text Add to dashboard Cite

Abstract.In recent years, a number of researches started to investigate the existence of links between cannabis use and psychotic disorder. More recently, artificial neural networks and in particular deep learning have set a revolutionary wave in pattern recognition and machine learning. This study proposes a novel machine learning approach based on neural network and deep learning algorithms, to developing highly accurate predictive models for the onset of first-episode psychosis. Our approach is based also on a novel methodology of optimising and post-processing the predictive models in a computationally intensive framework. A study of the trade-off between the volume of the data and the extent of uncertainty due to missing values, both of which influencing the predictive performance, enhanced this approach. Furthermore, we extended our approach by proposing and encapsulating a novel post-processing k-fold cross-testing method in order to further optimise, and test these models. The results show that the average accuracy in predicting first-episode psychosis achieved by our models in intensive Monte Carlo simulation, is about 89%.

show abstract

Machine Learning–Based Deep Phenotyping of Atopic Dermatitis

et al. 2021

View full text Add to dashboard Cite

IMPORTANCE Atopic dermatitis (AD) is the most common chronic inflammatory skin disease and is driven by a complex pathophysiology underlying highly heterogeneous phenotypes. Current advances in precision medicine emphasize the need for stratification. OBJECTIVE To perform deep phenotyping and identification of severity-associated factors in adolescent and adult patients with AD. DESIGN, SETTING, AND PARTICIPANTS Cross-sectional data from the baseline visit of a prospective longitudinal study investigating the phenotype among inpatients and outpatients with AD from the Department of Dermatology and Allergy of the University Hospital Bonn enrolled between November 2016 and February 2020.MAIN OUTCOMES AND MEASURES Patients were stratified by severity groups using the Eczema Area and Severity Index (EASI). The associations of 130 factors with AD severity were analyzed applying a machine learning-gradient boosting approach with cross-validationbased tuning as well as multinomial logistic regression.RESULTS A total of 367 patients (157 male [42.8%]; mean [SD] age, 39 [17] years; 94% adults) were analyzed. Among the participants, 177 (48.2%) had mild disease (EASI Յ7), 120 (32.7%) had moderate disease (EASI >7 and Յ 21), and 70 (19.1%) had severe disease (EASI >21).

show abstract

Applied Predictive Modeling

Cited by 3,643 publications

References 0 publications

Machine learning for predicting soil classes in three semi-arid landscapes

Machine learning for predicting soil classes in three semi-arid landscapes

Predicting First-Episode Psychosis Associated with Cannabis Use with Artificial Neural Networks and Deep Learning

Machine Learning–Based Deep Phenotyping of Atopic Dermatitis

Contact Info

Product

Resources

About