SummaryBiobanks are being established across the world to understand the genetic, environmental, and epidemiological basis of human diseases with the goal of better prevention and treatments. Genome-wide association studies (GWAS) have been very successful at mapping genomic loci for a wide range of human diseases and traits, but in general, lack appropriate representation of diverse ancestries - with most biobanks and preceding GWAS studies composed of individuals of European ancestries. Here, we introduce the Global Biobank Meta-analysis Initiative (GBMI) -- a collaborative network of 19 biobanks from 4 continents representing more than 2.1 million consented individuals with genetic data linked to electronic health records. GBMI meta-analyzes summary statistics from GWAS generated using harmonized genotypes and phenotypes from member biobanks. GBMI brings together results from GWAS analysis across 6 main ancestry groups: approximately 33,000 of African ancestry either from Africa or from admixed-ancestry diaspora (AFR), 18,000 admixed American (AMR), 31,000 Central and South Asian (CSA), 341,000 East Asian (EAS), 1.4 million European (EUR), and 1,600 Middle Eastern (MID) individuals. In this flagship project, we generated GWASs from across 14 exemplar diseases and endpoints, including both common and less prevalent diseases that were previously understudied. Using the genetic association results, we validate that GWASs conducted in biobanks worldwide can be successfully integrated despite heterogeneity in case definitions, recruitment strategies, and baseline characteristics between biobanks. We demonstrate the value of this collaborative effort to improve GWAS power for diseases, increase representation, benefit understudied diseases, and improve risk prediction while also enabling the nomination of disease genes and drug candidates by incorporating gene and protein expression data and providing insight into the underlying biology of the studied traits.
SummaryWith the increasing availability of biobank-scale datasets that incorporate both genomic data and electronic health records, many associations between genetic variants and phenotypes of interest have been discovered. Polygenic risk scores (PRS), which are being widely explored in precision medicine, use the results of association studies to predict the genetic component of disease risk by accumulating risk alleles weighted by their effect sizes. However, limited studies have thoroughly investigated best practices for PRS in global populations across different diseases. In this study, we utilize data from the Global-Biobank Meta-analysis Initiative (GBMI), which consists of individuals from diverse ancestries and across continents, to explore methodological considerations and PRS prediction performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRS using heuristic (pruning and thresholding, P+T) and Bayesian (PRS-CS) methods. We found that the genetic architecture, such as SNP-based heritability and polygenicity, varied greatly among endpoints. For both PRS construction methods, using a European ancestry LD reference panel resulted in comparable or higher prediction accuracy compared to several other non-European based panels; this is largely attributable to European descent populations still comprising the majority of GBMI participants. PRS-CS overall outperformed the classic P+T method, especially for endpoints with higher SNP-based heritability. For example, substantial improvements are observed in East-Asian ancestry (EAS) using PRS-CS compared to P+T for heart failure (HF) and chronic obstructive pulmonary disease (COPD). Notably, prediction accuracy is heterogeneous across endpoints, biobanks, and ancestries, especially for asthma which has known variation in disease prevalence across global populations. Overall, we provide lessons for PRS construction, evaluation, and interpretation using the GBMI and highlight the importance of best practices for PRS in the biobank-scale genomics era.
Keratoconus is characterised by reduced rigidity of the cornea with distortion and focal thinning that causes blurred vision, however, the pathogenetic mechanisms are unknown. It can lead to severe visual morbidity in children and young adults and is a common indication for corneal transplantation worldwide. Here we report the first large scale genome-wide association study of keratoconus including 4,669 cases and 116,547 controls. We have identified significant association with 36 genomic loci that, for the first time, implicate both dysregulation of corneal collagen matrix integrity and cell differentiation pathways as primary disease-causing mechanisms. The results also suggest pleiotropy, with some disease mechanisms shared with other corneal diseases, such as Fuchs endothelial corneal dystrophy. The common variants associated with keratoconus explain 12.5% of the genetic variance, which shows potential for the future development of a diagnostic test to detect susceptibility to disease.
Deletion of 18q12.2 is an increasingly recognized condition with a distinct neuropsychiatric phenotype. Twenty-two patients have been described with overlapping neurobehavioral disturbances including developmental delay, intellectual disability of variable degree, seizures, motor coordination disorder, behavioral/emotional disturbances, and autism spectrum disorders. The CUGBP Elav-like family member 4 (CELF4) gene at 18q12.2 encodes a RNA-binding protein that links to RNA subsets involved in pre- and postsynaptic neurotransmission including almost 30% of potential autism-related genes. Haploinsufficiency of CELF4 was associated with an autism or autistic behavior diagnosis in two adult patients with de novo 18q12.2 deletions. We report on a girl and her mildly affected mother with a 275 kb deletion at 18q12.2 involving CELF4 and KIAA1328 whose disruption is not associated with any known disease. The child was diagnosed with syndromic intellectual disability and autism at 6 years of age. Her mother had minor dysmorphisms, mild intellectual disability, and autistic behavior. The deleted region reported in this family is one of the smallest so far reported at 18q12.2. This is also the first full clinical description of maternally inherited CELF4 haploinsufficiency. The present study refines the molecular and neuropsychiatric phenotype associated with 18q12.2 deletion leading to CELF4 haploinsufficiency and provides evidence for a role for CELF4 in brain development and autism spectrum disorders.
Despite the success of genome-wide association studies, much of the genetic contribution to complex traits remains unexplained. Here, we analyse high coverage whole-genome sequencing data, to evaluate the contribution of rare genetic variants to 414 plasma proteins. The frequency distribution of genetic variants is skewed towards the rare spectrum, and damaging variants are more often rare. We estimate that less than 4.3% of the narrow-sense heritability is expected to be explained by rare variants in our cohort. Using a gene-based approach, we identify Cis-associations for 237 of the proteins, which is slightly more compared to a GWAS (N = 213), and we identify 34 associated loci in Trans. Several associations are driven by rare variants, which have larger effects, on average. We therefore conclude that rare variants could be of importance for precision medicine applications, but have a more limited contribution to the missing heritability of complex diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.