“…Recent developments in deep learning [13][14][15] offer significant opportunities to advance the PRS framework [16][17][18][19] , as they have the capacity to model both complex epistatic interactions and knowledge of molecular mechanisms. Among these, the Transformer has shown strong potential to address longstanding problems in the biomedical sciences, including prediction of 3D protein structures [20][21][22] , biomedical image analysis 23 , inference of gene expression from genome sequence [24][25][26] , and mapping a sequence of SNPs to a predicted phenotype 27,28 . The Transformer architecture is known for its central use of an "attention" mechanism 29 , an operation that dynamically computes the importance of each input element relative to others, enabling the model to focus on the most relevant features [29][30][31] .…”