Advances in applied machine learning techniques to neuroimaging have encouraged scientists to implement models to early diagnose brain disorders such as Alzheimer’s Disease. Predicting various stages of Alzheimer’s disease is challenging; however, existing deep learning complex techniques could perform such a prediction. Therefore, using novel architectures with less complexity but efficient pattern extraction capabilities such as transformers has been of interest to neuroscientists. This study introduced an optimized vision transformer architecture to predict the aging effect in healthy adults (>75 years), mild cognitive impairment, and Alzheimer’s’ brains within the same age group using resting-state functional and anatomical magnetic resonance imaging data. Our optimized architecture known as OViTAD, which is currently the sole vision transformer-based end-to-end pipeline, outperformed the existing transformer models and most state-of-the-art solutions with F1-scores of 97%±0.0 and 99.55%±0.39 achieved from the testing sets for the two modalities in the triple-class prediction experiments where the number of trainable parameters decreased by 30% compared to a vanilla transformer. To ensure the robustness and reproducibility of our optimized vision transformer, we repeated the modeling process three times for all the experiments and reported the averaged evaluation metrics. Furthermore, we implemented a visualization technique to illustrate the effect of global attention on brain images. Also, we exhaustively implemented models to explore the impact of combining healthy brains with two other groups in the two modalities. This study could open a new avenue of adopting and optimizing vision transformers for neuroimaging applications, especially for Alzheimer’s Disease prediction.