Blood and urine biomarkers are an essential part of modern medicine, not only for diagnosis, but also for their direct influence on disease. Many biomarkers have a genetic component, and they have been studied extensively with genome-wide association studies (GWAS) and methods that compute polygenic scores (PGSs). However, these methods generally assume both an additive allelic model and an additive genetic architecture for the target outcome, and thereby risk not capturing non-linear allelic effects nor epistatic interactions. Here, we trained and evaluated deep-learning (DL) models for PGS prediction of 34 blood and urine biomarkers in the UK Biobank cohort, and compared them to linear methods. For lipid traits, the DL models greatly outperformed the linear methods, which we found to be consistent across diverse populations. Furthermore, the DL models captured non-linear effects in covariates, non-additive genotype (allelic) effects, and epistatic interactions between SNPs. Finally, when using only genome-wide significant SNPs from GWAS, the DL models performed equally well or better for all 34 traits tested. Our findings suggest that DL can serve as a valuable addition to existing methods for genotype-phenotype modelling in the era of increasing data availability.