Background
The search for neuroimaging biomarkers of alcohol use disorder (AUD) has primarily been restricted to significance testing in small datasets of low diversity. To identify neurobiological markers beyond individual differences, it may be useful to develop classification models for AUD. The ever-increasing quantity of neuroimaging data demands methods that are robust to the complexities of multi-site designs and are generalizable to data from new scanners.
Methods
This study represents a mega-analysis of previously published datasets from 2,034 AUD and comparison participants spanning 27 sites, coordinated by the ENIGMA Addiction Working Group. Data were grouped into a training set including 1,652 participants (692 AUD, 24 sites), and test set with 382 participants (146 AUD, 3 sites). A battery of machine learning classifiers was evaluated using repeated random cross-validation (CV) and leave-site-out CV. Area under the receiver operating characteristic curve (AUC) was our base metric of performance.
Results
Multi-objective evolutionary search was conducted to identify sparse, generalizable, and high performing subsets of brain measurements. Cortical thickness in the left superior frontal gyrus and right lateral orbitofrontal cortex, cortical surface area in the right transverse temporal gyrus, and left putamen volume, appeared most frequently across searches. Restricting a regularized logistic regression model to these four features yielded a test-set AUC of .768.
Conclusions
Developing classification models on multi-site data with varied underlying class distributions poses unique challenges. Supplementing datasets with controls from new sites and performing feature selection increases generalizability. Four features identified by evolutionary search may serve as specific biomarkers for individuals with current AUD.