Some forms of mild cognitive impairment (MCI) can be the clinical precursor of severe dementia like Alzheimer's disease (AD), while other types of MCI tend to remain stable over-time and do not progress to AD pathology. To choose an effective and personalized treatment for AD, we need to identify which MCI patients are at risk of developing AD and which are not. Here, we present a novel deep learning architecture, based on dual learning and an ad hoc layer for 3D separable convolutions, which aims at identifying those people with MCI who have a high likelihood of developing AD. Our deep learning procedures combine structural magnetic resonance imaging (MRI), demographic, neuropsychological, and APOe4 genotyping data as input measures. The most novel characteristics of our machine learning model compared to previous ones are as follows: 1) multi-tasking, in the sense that our deep learning model jointly learns to simultaneously predict both MCI to AD conversion, and AD vs healthy classification which facilitates the relevant feature extraction for prognostication; 2) the neural network classifier employs relatively few parameters compared to other deep learning architectures (we use ~550,000 network parameters, orders of magnitude lower than other network designs) without compromising network complexity and hence significantly limits data-overfitting; 3) both structural MRI images and warp field characteristics, which quantify the amount of volumetric change compared to the common template, were used as separate input streams to extract as much information as possible from the MRI data. All the analyses were performed on a subset of the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, for a total of n=785 participants (192 AD, 409 MCI, and184 healthy controls (HC)). We found that the most predictive combination of inputs included the structural MRI images and the demographic, neuropsychological, and APOe4 data, while the warp field metric added little predictive value. We achieved an area under the ROC curve (AUC) of 0.925 with a 10-fold cross-validated accuracy of 86%, a sensitivity of 87.5% and specificity of 85% in classifying MCI patients who developed AD in three years' time from those individuals showing stable MCI over the same time-period. To the best of our knowledge, this is the highest performance reported on a test set achieved in the literature using similar data. The same network provided an AUC of 1 and 100% accuracy, sensitivity and specificity when classifying NC from AD. We also demonstrated that our classification framework was robust to different co-registration templates and possibly irrelevant features / image sections. Our approach is flexible and can in principle integrate other imaging modalities, such as PET, and a more diverse group of clinical data. The convolutional framework is potentially applicable to any 3D image dataset and gives the flexibility to design a computer-aided diagnosis system targeting the prediction of any medical condition utilizing multi-modal imaging a...