In the version of this article originally published, there was an error in the phrase "This dataset contained 1,739 cases (27 cancer-positives)" in the main text. The number 1,739 should have been 1,139. There was also an error in the Fig. 4c legend. In the phrase "comprising n = 1,739 cases", the number 1,739 again should have been 1,139. Additionally, in the Extended Data Fig. 5 legend, the phrase "AUC curve for the independent data test set with n = 1,739 cases" contained the same error. The number should have been 1,139 instead of 1,739. The errors have been corrected in the HTML and PDF versions of this article.
Widely popular transformer-based NLP models such as BERT and GPT have enormous capacity trending to billions of parameters. Current execution methods demand brute-force resources such as HBM devices and high speed interconnectivity for data parallelism. In this paper, we introduce a new relay-style execution technique called L2L (layer-to-layer) where at any given moment, the device memory is primarily populated only with the executing layer(s)'s footprint. The model resides in the DRAM memory attached to either a CPU or an FPGA as an entity we call eager param-server (EPS). Unlike a traditional param-server, EPS transmits the model piecemeal to the devices thereby allowing it to perform other tasks in the background such as reduction and distributed optimization. To overcome the bandwidth issues of shuttling parameters to and from EPS, the model is executed a layer at a time across many micro-batches instead of the conventional method of minibatches over whole model. In this paper, we explore a conservative version of L2L that is implemented on a modest Azure instance for BERT-Large running it with a batch size of 32 on a single V100 GPU using less than 8GB memory. Our results show a more stable learning curve, faster convergence, better accuracy and 35% reduction in memory compared to the state-of-the-art baseline. Our method reproduces BERT results on any mid-level GPU that was hitherto not feasible. L2L scales to arbitrary depth without impacting memory or devices allowing researchers to develop affordable devices. It also enables dynamic approaches such as neural architecture search. This work has been performed on GPUs first, but also targeted towards high TFLOPS/Watt accelerators such as Graphcore IPUs. The code will soon be available on github.
Early detection of age-related diseases will greatly benefit from a model of the underlying biological aging process. In this paper, we develop a brain-age predictor by using structural magnetic resonance imaging (SMRI) and deep learning and evaluate the predicted brain age as a marker of brain-aging in Alzheimer's disease. Our approach does not require any domain knowledge in that it trains end-to-end on the SMRI image itself, and has been validated on real SMRI data collected from elderly subjects. We developed two different models by using convolutional neural network (CNN) based regression and bucket classification to predict brain ages from SMRI images. Our models achieved root mean squared errors (RMSE) of 5.54 and 6.44 years in predicting brain ages of healthy subjects. Further analysis showed that there is a substantial difference between the predicted brain ages of cognitively impaired and healthy subjects with similar chronological ages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.