Genome-wide association studies (GWAS) may require enrollment of up to millions of participants to power variant discovery. This requires manual curation of cases and controls with large-scale collaborations. Biobanks connected to electronic health records (EHR) can facilitate these studies by using data from clinical care systems, like billing diagnosis codes, as phenotypes. These systems, however, do not de ne adjudicated cases and controls. Machine learning can add nuance to these de nitions. We developed QTPhenProxy, a machine learning model that assigns everyone in a cohort a probability of having the study disease, and then run a GWAS using the probabilities as a quantitative trait. With an order of magnitude fewer cases than the largest stroke GWAS, our method outperformed previous methods at replicating known variants in stroke and discovered a novel variant in ABCG8 associated with intracerebral hemorrhage in the UK Biobank.QTPhenProxy expands traditional phenotyping to improve the power of GWAS.