Genome-wide association studies (GWAS) have identified many variants associated with complex traits, but identifying the causal gene(s) is a major challenge. Here we present an open resource that provides systematic fine-mapping and gene prioritization across 133,441 published human GWAS loci. We integrate genetics (GWAS Catalog and UK Biobank) with transcriptomic, proteomic and epigenomic data, including systematic disease-disease and disease-molecular trait colocalization results across 92 cell types and tissues. We identify 729 loci fine-mapped to a single coding causal variant and colocalized with a single gene. We trained a machine learning model using the fine-mapped genetics and functional genomics data using 445 gold-standard curated GWAS loci to distinguish causal genes from neighboring, outperforming a naive distance-based model. Our prioritized genes were enriched for known approved drug targets (OR = 8.1, 95% CI: (5.7, 11.5)). These results are publicly available through a web portal ( http://genetics.opentargets.org ), enabling users to easily prioritize genes at disease-associated loci and assess their potential as drug targets.
Genome-wide association studies (GWAS) have identified many variants robustly associated with complex traits but identifying the gene(s) mediating such associations is a major challenge. Here we present an open resource that provides systematic fine-mapping and protein-coding gene prioritization across 133,441 published human GWAS loci. We integrate diverse data sources, including genetics (from GWAS Catalog and UK Biobank) as well as transcriptomic, proteomic and epigenomic data across many tissues and cell types. We also provide systematic disease-disease and disease-molecular trait colocalization results across 92 cell types and tissues and identify 729 loci fine-mapped to a single coding causal variant and colocalized with a single gene. We trained a machine learning model using the fine mapped genetics and functional genomics data using 445 gold standard curated GWAS loci to distinguish causal genes from background genes at the same loci, outperforming a naive distance based model. Genes prioritized by our model are enriched for known approved drug targets (OR = 8.1, 95% CI: [5.7, 11.5]). These results will be regularly updated and are publicly available through a web portal, Open Targets Genetics (OTG, http://genetics.opentargets.org), enabling users to easily prioritize genes at disease-associated loci and assess their potential as drug targets.
Interacting proteins tend to have similar functions, influencing the same organismal traits. Interaction networks can be used to expand the list of candidate trait-associated genes from genome-wide association studies. Here, we performed network-based expansion of trait-associated genes for 1,002 human traits showing that this recovers known disease genes or drug targets. The similarity of network expansion scores identifies groups of traits likely to share an underlying genetic and biological process. We identified 73 pleiotropic gene modules linked to multiple traits, enriched in genes involved in processes such as protein ubiquitination and RNA processing. In contrast to gene deletion studies, pleiotropy as defined here captures specifically multicellular-related processes. We show examples of modules linked to human diseases enriched in genes with known pathogenic variants that can be used to map targets of approved drugs for repurposing. Finally, we illustrate the use of network expansion scores to study genes at inflammatory bowel disease genome-wide association study loci, and implicate inflammatory bowel disease-relevant genes with strong functional and genetic support.
Detection of sub-microscopic levels of disease (minimal residual disease; MRD) in childhood acute lymphoblastic leukaemia (ALL) during treatment is an important prognostic factor. Currently, stratification of therapy for the new frontline trial in childhood ALL (UKALL 2011) is provided by MRD analysis using real time quantitative PCR (RQ-PCR) to identify and quantitate the patient specific rearrangements of the immunoglobulin (Ig) and T-cell receptor (TCR) genes. The current methodology is expensive, time-consuming and complex to perform. Although MRD has proven to be a powerful and essential tool in stratification of ALL patients, 8% of individuals in the current UKALL 2011 trial do not have an informative MRD result. Recently, Next Generation Sequencing (NGS) has led to the opportunity to improve the sensitivity and specificity of Ig/TCR based MRD analysis. In this study, we focussed on the IgH locus using BIOMED 2 primers (van Dongen et al., 2003) modified to allow target identification and quantitation by deep sequencing on the Illumina MiSeq platform. We developed a novel pipeline to automate the clustering and classification of sequencing reads leading to characterisation of the clonal subtypes present. In a sample of 12 patients, the method correctly identified all the major clones revealed by current methodologies, and also detected many related and unrelated low-frequency clones. Additional targets were also identified in patients in which no IgH targets were detectable by current methodologies. These NGS-identified targets were subsequently used to monitor MRD by RQ-PCR to the desired quantitative range required for stratification of therapy according to UKALL 2011 guidelines (Figure 1). In addition, we were able to delineate patterns of IgH rearrangements in two patients previously shown to have oligoclonal (>2) rearrangements. Such patients represent a time consuming and technical challenge for current technologies as it is important that all targets at the locus are followed by RQ-PCR to provide an informative and robust MRD result. Furthermore, by clustering similar sequences, we identified diagnostic samples where multiple V regions are attached to the same N1-D-N2-J region. This may allow for the study of clonal evolution in follow-up samples. Altogether, NGS sequencing has the potential to significantly reduce false negative results, as multiple evolved clones can be identified. This methodology also represents a significant time saving (5-7 days) in comparison to established methods (3-4 weeks).Figure 1.(a) Polyacrylamide electrophoresis could not recognise a target to use in current MRD methodologies (well 1 containing the products from a PCR reaction that would amplify VH1 and VH7, and wells 2-6 amplifying VH2-6, respectively), while the NGS pipeline could identify a VH7 rearrangement (b). (c) ASOs were designed to amplify the NGS-identified VH7-81*01 DH3-9*01 JH4*02 rearrangement and optimised to correctly identify 10-2, 10-3, 10-4 dilutions with a single NAC (non-amplification control; monocytes from 20 normal individuals) replicate amplified, therefore meeting current guidelines for a MRD target.Figure 1. (a) Polyacrylamide electrophoresis could not recognise a target to use in current MRD methodologies (well 1 containing the products from a PCR reaction that would amplify VH1 and VH7, and wells 2-6 amplifying VH2-6, respectively), while the NGS pipeline could identify a VH7 rearrangement (b). (c) ASOs were designed to amplify the NGS-identified VH7-81*01 DH3-9*01 JH4*02 rearrangement and optimised to correctly identify 10-2, 10-3, 10-4 dilutions with a single NAC (non-amplification control; monocytes from 20 normal individuals) replicate amplified, therefore meeting current guidelines for a MRD target. Having established NGS for identifying clonal targets in ALL, we are currently assessing the ability of the method and pipeline to quantify disease levels in end of induction and relapse samples, previously analysed by RQ-PCR, to determine the concordance between the methodologies. Indeed, logarithmic dilution series of patient DNA in a normal background revealed that stratification based on a clinical threshold of 1 in 1,000,000 is possible using this methodology. Further investigation into the clinical utility of NGS for MRD analysis will focus on analysing earlier time points in treatment and studying the potential use of blood rather than bone marrow. Altogether, this will further improve the predictive value and specificity of MRD testing. Disclosures: No relevant conflicts of interest to declare.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.