Highlights d Human pancreatic islets are key drivers of diabetes and related pathophysiology d TIGER integrates omics and expression regulatory variation in 514 human islet samples d TIGER expression regulatory variation allows the identification of diabetes effector genes d The integrated human islet data in TIGER are publicly available through http://tiger.bsc.es
Genome-wide association studies (GWAS) are not fully comprehensive, as current strategies typically test only the additive model, exclude the X chromosome, and use only one reference panel for genotype imputation. We implement an extensive GWAS strategy, GUIDANCE, which improves genotype imputation by using multiple reference panels and includes the analysis of the X chromosome and non-additive models to test for association. We apply this methodology to 62,281 subjects across 22 age-related diseases and identify 94 genome-wide associated loci, including 26 previously unreported. Moreover, we observe that 27.7% of the 94 loci are missed if we use standard imputation strategies with a single reference panel, such as HRC, and only test the additive model. Among the new findings, we identify three novel low-frequency recessive variants with odds ratios larger than 4, which need at least a three-fold larger sample size to be detected under the additive model. This study highlights the benefits of applying innovative strategies to better uncover the genetic architecture of complex diseases.
In this deliverable, the ExaQUte xmc library is introduced. This report is meant to serve as a supplement to the publicly release of the library. In the following sections, the ExaQUte xmc library is described along with its current and future capabilities. The structure of the library, along with its dynamic import mechanism, are described using samples of code. The algorithms behind the example files supplied with the public release are explained in detail as well.
The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization, and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression Models (LRMs), we present a catalogue of 35 431 441 variants, including 89 178 SVs (≥50 bp), 30 325 064 SNVs and 5 017 199 indels, across 785 Illumina high coverage (30x) whole-genomes from the Iberian GCAT Cohort, containing a median of 3.52M SNVs, 606 336 indels and 6393 SVs per individual. The haplotype panel is able to impute up to 14 360 728 SNVs/indels and 23 179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with Mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies.
Python has been adopted as programming language by a large number of scientific communities. Additionally to the easy programming interface, the large number of libraries and modules that have been made available by a large number of contributors, have taken this language to the top of the list of the most popular programming languages in scientific applications. However, one main drawback of Python is the lack of support for concurrency or parallelism. PyCOMPSs is a proved approach to support task-based parallelism in Python that enables applications to be executed in parallel in distributed computing platforms. This paper presents PyCOMPSs and how it has been tailored to execute tasks in heterogeneous and multi-threaded environments. We present an approach to combine the task-level parallelism provided by PyCOMPSs with the thread-level parallelism provided by MKL. Performance and behavioral results in distributed computing heterogeneous clusters show the benefits and capabilities of PyCOMPSs in both HPC and Big Data infrastructures.
Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.
64Genome-wide association studies (GWAS) are not fully comprehensive as current strategies 65 typically test only the additive model, exclude the X chromosome, and use only one 66 reference panel for genotype imputation. We implemented an extensive GWAS strategy, 67 GUIDANCE, which improves genotype imputation by using multiple reference panels, 68 includes the analysis of the X chromosome and non-additive models to test for association. 69We applied this methodology to 62,281 subjects across 22 age-related diseases and 70 identified 94 genome-wide associated loci, including 26 previously unreported. We observed 71 that 27.6% of the 94 loci would be missed if we only used standard imputation strategies and 72 only tested the additive model. Among the new findings, we identified three novel low-73 frequency recessive variants with odds ratios larger than 4, which would need at least a 74 three-fold larger sample size to be detected under the additive model. This study highlights 75 the benefits of applying innovative strategies to better uncover the genetic architecture of 76 complex diseases. 77 79Genome-wide association studies (GWAS) have been successful in identifying thousands of 80 associations between genetic variation and human complex diseases and traits 1 . 81Nevertheless, for most complex diseases, only a small fraction of their genetic architecture is 82 known and a small amount of the estimated heritability is explained 2 . Variants that 83 individually have small contributions to the risk of disease, and/or are rare in the population, 84 are often missed by the majority of GWAS even though their role in the pathophysiology of 85 complex diseases can be crucial. Some of the current limitations of GWAS could be 86 overcome by increasing sample sizes and, as recently demonstrated, by applying more 87 comprehensive analytical methods with improved imputation strategies 3 . Though the 88 increase of sample size might allow the detection of more genetic signals, it also imposes 89 major methodological and computational requirements. These can require scientists to 90 restrict and simplify the analysis by limiting it to autosomal chromosomes, a single reference 91 panel for imputation, and a single (additive) inheritance model for association testing, leaving 92 a relevant fraction of the genetic architecture of the disease unexplored 4 . 93The genetic variants that modify the risk to develop a particular complex disease may 94 contribute to the final phenotype through different functional mechanism defined by a 95 particular model of inheritance, which is further reflected in a characteristic distribution of 96 affected alleles across patients and healthy individuals in GWAS. For example, the additive 97 inheritance model, which is often the only genetic model tested, assumes that the risk of the 98 disease is proportional to the number of risk alleles in an individual, i. e., that the effect of the 99 heterozygous genotype is halfway between the two possible homozygous genotypes. 100However, some variants...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.