Hans Kristian Ruud scite author profile

Background Machine learning techniques are increasingly being applied in health research. It is not clear how useful these approaches are for modeling continuous outcomes. Child quality of life is associated with parental socioeconomic status and physical activity and may be associated with aerobic fitness and strength. It is unclear whether diet or academic performance is associated with quality of life. Objective The purpose of this study was to compare the predictive performance of machine learning techniques with that of linear regression in examining the extent to which continuous outcomes (physical activity, aerobic fitness, muscular strength, diet, and parental education) are predictive of academic performance and quality of life and whether academic performance and quality of life are associated. Methods We modeled data from children attending 9 schools in a quasi-experimental study. We split data randomly into training and validation sets. Curvilinear, nonlinear, and heteroscedastic variables were simulated to examine the performance of machine learning techniques compared to that of linear models, with and without imputation. Results We included data for 1711 children. Regression models explained 24% of academic performance variance in the real complete-case validation set, and up to 15% in quality of life. While machine learning techniques explained high proportions of variance in training sets, in validation, machine learning techniques explained approximately 0% of academic performance and 3% to 8% of quality of life. With imputation, machine learning techniques improved to 15% for academic performance. Machine learning outperformed regression for simulated nonlinear and heteroscedastic variables. The best predictors of academic performance in adjusted models were the child’s mother having a master-level education (P<.001; β=1.98, 95% CI 0.25 to 3.71), increased television and computer use (P=.03; β=1.19, 95% CI 0.25 to 3.71), and dichotomized self-reported exercise (P=.001; β=2.47, 95% CI 1.08 to 3.87). For quality of life, self-reported exercise (P<.001; β=1.09, 95% CI 0.53 to 1.66) and increased television and computer use (P=.002; β=−0.95, 95% CI −1.55 to −0.36) were the best predictors. Adjusted academic performance was associated with quality of life (P=.02; β=0.12, 95% CI 0.02 to 0.22). Conclusions Linear regression was less prone to overfitting and outperformed commonly used machine learning techniques. Imputation improved the performance of machine learning, but not sufficiently to outperform regression. Machine learning techniques outperformed linear regression for modeling nonlinear and heteroscedastic relationships and may be of use in such cases. Regression with splines performed almost as well in nonlinear modeling. Lifestyle variables, including physical exercise, television and computer use, and parental education are predictive of academic performance or quality of life. Academic performance is associated with quality of life after adjusting for lifestyle variables and may offer another promising intervention target to improve quality of life in children.

show abstract

Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study (Preprint)

Froud¹,

Hansen²,

Ruud³

et al. 2020

Preprint

View full text Add to dashboard Cite

BACKGROUND

Machine learning (ML) approaches are increasingly being used in health research. It is not clear how useful these approaches are for modelling continuous health outcomes. Child quality of life (QoL) is associated with parental socioeconomic status and child activity levels, and may be associated with aerobic fitness and strength. It is not clear whether diet, or academic performance (AP) is associated with QoL.

OBJECTIVE

To compare predictive performances of ML approaches with linear regression for modelling QoL and AP using parental education and lifestyle data.

METHODS

We modelled data from children attending nine schools in a quasi-experimental study (NCT02495714). We split data randomly into training and validation sets, and simulated curvilinear, non-linear, and heteroscedastic variables. We examined relative performance of ML approaches using R2, making comparisons to mixed and fixed models, and regression with splines, with and without imputation. We also examined the effect of training set size on overfitting.

RESULTS

We had 1,711 cases. Using real data, our regression models explained 24% of AP variance in the complete-case validation set, and up to 15% of QoL variance. While ML models explained high proportions of variance in training sets, in validation sets these explained ~0% of AP and between 3% and 8% of QoL. Following imputation, ML models improved up to 15% for AP. ML models outperformed regression for modelling simulated non-linear and heteroscedastic variables only. A smaller training set did not lead to increased overfitting. The best predictors of QoL were 7-point self-reported activity (P<.001; ß=1.09 (95% CI 0.53 to 1.66)) and TV/computer use (P=.002; ß=-0.95 (-1.55 to -0.36)). For AP, these were mother having master’s-level education (P<.001; ß=1.98 (0.25 to 3.71)) and dichotomised self-reported activity (P=.001; ß=2.47 (1.08 to 3.87)). Adjusted academic performance was associated with QoL (P=.02; ß=0.12 (0.02 to 0.22)).

CONCLUSIONS

Exercising to cause sweat once per week and 2 hours per day of TV or computer use are associated with small-to-medium increases and decreases in child QoL, respectively. An increase in AP of 20 units is associated with a small increase in QoL. A mother having higher and master’s-level education, 2 hours per day of TV or computer use, and taking at least 2 hours of exercise, are each associated with small-to-medium increases in AP. Differences between effects of computer/TV use for work/leisure needs further investigation. Linear regression is less prone to overfitting and performs better than ML in predicting continuous health outcomes in a dataset containing missing data. Imputation improves ML performance but not enough to outperform regression. ML outperformed regression with non-linear and heteroscedastic data and may be of use when such relationships exist, and where imputation is sensible or there are no missing data.

CLINICALTRIAL

The data are from a quasi-experimental design and not an RCT but nevertheless the study from which the data are from does have a registration: NCT02495714

show abstract

Search for good examples of Hall’s conjecture

2018

View full text Add to dashboard Cite

Consider the equationwhere x, y ∈ N and k ∈ Z. It is easy to see that (*) has infinitely many solutions where k = 0 (let x = t 2 and y = t 3 where t is a natural number). It turns out that (*) only has finitely many solutions in x and y when k is a given integer different from 0. Moreover, it is hard to find solutions of (*) where k is small compared to x and y. Hall's Conjecture states that there exists a constant C such that for any solution of (*) where k = 0, we have C √ x < |k|. For more on Hall's Conjecture, see [1] and [3].Hall's Conjecture is neither proved nor disproved. To shed some light upon the conjecture, researchers has tried to find solutions of (*) where 0 < |k| < √ x. We will refer to such solutions as good examples of Hall's Conjecture, and we will say that (x, y, k) is a good triplet when x, y ∈ N and 0This paper is a preliminary report on our search for new good examples of Hall's Conjecture. We present a new algorithm that will detect all good examples

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.