Background: Although genome-wide association studies have identified many genomic regions associated with idiopathic pulmonary fibrosis (IPF), the causal genes and functions remain largely unknown. Many bulk and single-cell expression data have become available for IPF, and there is increasing evidence suggesting a shared genetic basis between IPF and other diseases.
Methods: By leveraging shared genetic information and transcriptome data, we conducted an integrative analysis to identify novel genes for IPF. We first considered observed phenotypes, polygenic risk scores, and genetic correlations to investigate associations between IPF and other traits in the UK Biobank. We then performed local genetic correlation analysis and cross-tissue transcriptome-wide association analysis (TWAS) to identify IPF genes. We further prioritized genes using bulk and single-cell gene expression data.
Findings: We identified 25 traits correlated with IPF on the phenotype level and seven traits genetically correlated with IPF. Using local genetic correlation, we identified 12 candidate genes across 14 genomic regions, including the POT1 locus (p-value = 4.1E-4), which contained variants with protective effects on lung cancer but increasing IPF risk. Using TWAS, we identified 36 genes, including 12 novel genes for IPF. Annotation-stratified heritability estimation and differential expression analysis of downstream-regulated genes suggested regulatory roles of two candidate genes, MAFK and SMAD2, on IPF.
Interpretation: Our integrative analysis identified new genes for IPF susceptibility and expanded the understanding of the complex genetic architecture of IPF.
Funding: NIHR Leicester Biomedical Research Centre, Three Lakes Partners, the National Institutes of Health, the National Science Foundation, U01HL145567, and UH2HL123886.