2016
DOI: 10.1371/journal.pone.0148195
|View full text |Cite
|
Sign up to set email alerts
|

Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

Abstract: BackgroundAtheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.MethodsThe study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
43
0
3

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 75 publications
(46 citation statements)
references
References 75 publications
(77 reference statements)
0
43
0
3
Order By: Relevance
“…This technique has been previously used to identify biomarkers associated with depression [36] and to describe lifestyle clusters associated with depression [18] using data from the NHANES study. Depression was considered as a binary outcome and run for each key cluster using Friedman’s Multiple Additive Regression Trees (MART) boosted algorithm [37,38].…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…This technique has been previously used to identify biomarkers associated with depression [36] and to describe lifestyle clusters associated with depression [18] using data from the NHANES study. Depression was considered as a binary outcome and run for each key cluster using Friedman’s Multiple Additive Regression Trees (MART) boosted algorithm [37,38].…”
Section: Methodsmentioning
confidence: 99%
“…Depression was considered as a binary outcome and run for each key cluster using Friedman’s Multiple Additive Regression Trees (MART) boosted algorithm [37,38]. Consistent with previous research using this ML algorithm on the 2009 to 2010 NHANES data [36], validation was performed using a random split of each data set into 60% training and 40% validation, a regularization shrinkage parameter of 0.001, with 50% of the residuals used to fit each successive tree (50% bagging) [37]. The maximum number of boosting interactions (i.e.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The authors conclude that big data can be used effectively for generating hypotheses. 177 Larger biomarker phenotyping projects are now underway and will help to advance our journey into the future of the neurobiology of depression.…”
Section: Big Datamentioning
confidence: 99%
“…Deep Learning (word2vec) [299] Research Articles [299] Depression DT [303], kNN [134,298], NN [295], Regression [294,296], RF [134], SVM [134], Linear Discriminant Analysis [134] Survey [296,303,304], Social Media [298], Electronic Health Records [295], Imaging [134,294], Biological [134,296] Healthy Ageing RF [304] Survey [304] Psychosis SVM, Multiple Kernel Learning [297] Imaging [297] Schizophrenia RF [291], SVM [291,293], Linear Discriminant Analysis [291], kNN [291] Insurance [291], Imaging [293] Substance Use Topic modelling [306] Interview [306] Symptom Severity NN [301] Clinical Notes [301] Wellbeing BN [302], SVM [302], Deep Learning (paragraph2vec) [300], NN [307] Clinical Notes [300,302] As an emerging field, there are understandably significant gaps for future research to address. It is evident that the majority of papers focus on diagnosis and detection, particularly on depression, suicide risk and cognitive decline.…”
Section: Technique(s) Data Typementioning
confidence: 99%