Human leukocyte antigen (HLA) genes encode proteins with important roles in the regulation of the immune system. Many studies have also implicated HLA genes in psychiatric and neurodevelopmental disorders. However, these studies usually focus on one disorder and/or on one HLA candidate gene, often with small samples. Here, we access a large dataset of 65,534 genotyped individuals consisting of controls (N = 19,645) and cases having one or more of autism spectrum disorder (N = 12,331), attention deficit hyperactivity disorder (N = 14,397), schizophrenia (N = 2401), bipolar disorder (N = 1391), depression (N = 18,511), anorexia (N = 2551) or intellectual disability (N = 3175). We imputed participants’ HLA alleles to investigate the involvement of HLA genes in these disorders using regression models. We found a pronounced protective effect of DPB1*1501 on susceptibility to autism (p = 0.0094, OR = 0.72) and intellectual disability (p = 0.00099, OR = 0.41), with an increased protective effect on a comorbid diagnosis of both disorders (p = 0.003, OR = 0.29). We also identified a risk allele for intellectual disability, B*5701 (p = 0.00016, OR = 1.33). Associations with both alleles survived FDR correction and a permutation procedure. We did not find significant evidence for replication of previously-reported associations for autism or schizophrenia. Our results support an implication of HLA genes in autism and intellectual disability, which requires replication by other studies. Our study also highlights the importance of large sample sizes in HLA association studies.
Sample recruitment for research consortia, biobanks, and personal genomics companies span years, necessitating genotyping in batches, using different technologies. As marker content on genotyping arrays varies, integrating such datasets is non-trivial and its impact on haplotype estimation (phasing) and whole genome imputation, necessary steps for complex trait analysis, remains under-evaluated. Using the iPSYCH dataset, comprising 130,438 individuals, genotyped in two stages, on different arrays, we evaluated phasing and imputation performance across multiple phasing methods and data integration protocols. While phasing accuracy varied by choice of method and data integration protocol, imputation accuracy varied mostly between data integration protocols. We demonstrate an attenuation in imputation accuracy within samples of non-European origin, highlighting challenges to studying complex traits in diverse populations. Finally, imputation errors can bias association tests, reduce predictive utility of polygenic scores. Carefully optimized data integration strategies enhance accuracy and replicability of complex trait analyses in complex biobanks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.