Deep neural networks offer a promising approach for capturing complex, non-linear relationships among variables. Because they require immense sample sizes, their potential has yet to be fully tapped for understanding complex relationships between gene expression and human phenotypes. Encouragingly, a growing number of diseases are being studied through consortium efforts. Here we introduce a new analysis framework, namely MD-AD (Multi-task Deep learning for Alzheimer's Disease neuropathology), which leverages an unexpected synergy between deep neural networks and multi-cohort settings. In these settings, true joint analysis can be stymied using conventional statistical methods, which (1) require "harmonized" phenotypes (i.e., measured in a highly consistent manner) and (2) tend to capture cohort-level variations, obscuring the subtler true disease signals. Instead, MD-AD incorporates multiple related phenotypes sparsely measured across cohorts, and learns complex, non-linear interactions between genes and phenotypes not discovered using conventional expression data analysis methods (e.g., component analysis and module detection), enabling the model to capture subtler signals than cohort-level variations. Applied to the largest available collection of brain samples (N=1,758), we demonstrate that MD-AD learns a truly generalizable relationship between gene expression program and AD-related neuropathology. The learned program generalizes in several important ways, including recapitulation of the disease progress in animal models and across tissue types, and we show that such generalizability is not achieved by previous statistical paradigms. Its ability to identify genes with high non-linear relevance to neuropathology enabled us to identify a sex-specific relationship between neuropathology and immune response across microglia, providing a nuanced context for association between inflammatory genes and AD.