Motivation: LD score regression is a reliable and efficient method of using genome-wide association study (GWAS) summary-level results data to estimate the SNP heritability of complex traits and diseases, partition this heritability into functional categories, and estimate the genetic correlation between different phenotypes. Because the method relies on summary level results data, LD score regression is computationally tractable even for very large sample sizes. However, publicly available GWAS summary-level data are typically stored in different databases and have different formats, making it difficult to apply LD score regression to estimate genetic correlations across many different traits simultaneously. Results: In this manuscript, we describe LD Hub -a centralized database of summary-level GWAS results for 173 diseases/traits from different publicly available resources/consortia and a web interface that automates the LD score regression analysis pipeline. To demonstrate functionality and validate our software, we replicated previously reported LD score regression analyses of 49 traits/diseases using LD Hub; and estimated SNP heritability and the genetic correlation across the different phenotypes. We also present new results obtained by Bioinformatics Advance Access published September 22, 20162 uploading a recent atopic dermatitis GWAS meta-analysis to examine the genetic correlation between the condition and other potentially related traits. In response to the growing availability of publicly accessible GWAS summary-level results data, our database and the accompanying web interface will ensure maximal uptake of the LD score regression methodology, provide a useful database for the public dissemination of GWAS results, and provide a method for easily screening hundreds of traits for overlapping genetic aetiologies. Availability and implementation:The web interface and instructions for using LD Hub are available at
Y-chromosomal (Y-DNA) haplogroups are more widely used in population genetics than in genetic epidemiology, although associations between Y-DNA haplogroups and several traits, including cardiometabolic traits, have been reported. In apparently homogeneous populations defined by principal component analyses, there is still Y-DNA haplogroup variation which will result from population history. Therefore, hidden stratification and/or differential phenotypic effects by Y-DNA haplogroups could exist. To test this, we hypothesised that stratifying individuals according to their Y-DNA haplogroups before testing for associations between autosomal single nucleotide polymorphisms (SNPs) and phenotypes will yield difference in association. For proof of concept, we derived Y-DNA haplogroups from 6537 males from two epidemiological cohorts, Avon Longitudinal Study of Parents and Children (ALSPAC) (n = 5080; 816 Y-DNA SNPs) and the 1958 Birth Cohort (n = 1457; 1849 Y-DNA SNPs), and studied the robust associations between 32 SNPs and body mass index (BMI), including SNPs in or near Fat Mass and Obesity-associated protein (FTO) which yield the strongest effects. Overall, no association was replicated in both cohorts when Y-DNA haplogroups were considered and this suggests that, for BMI at least, there is little evidence of differences in phenotype or SNP association by Y-DNA structure. Further studies using other traits, phenome-wide association studies (PheWAS), other haplogroups and/or autosomal SNPs are required to test the generalisability and utility of this approach.
Recent technological advances have created challenges for geneticists and a need to adapt to a wide range of new bioinformatics tools and an expanding wealth of publicly available data (e.g. mutation databases, software). This wide range of methods and a diversity of file formats used in sequence analysis is a significant issue, with a considerable amount of time spent before anyone can even attempt to analyse the genetic basis of human disorders. Another point to consider is although many possess 'just enough' knowledge to analyse their data, they do not make full use of the tools and databases that are available and also do not know how their data was created. The primary aim of this review is to document some of the key approaches and provide an analysis schema to make the analysis process more efficient and reliable in the context of discovering highly penetrant causal mutations/genes. This review will also compare the methods used to identify highly penetrant variants when data is obtained from consanguineous individuals as opposed to non-consanguineous; and when Mendelian disorders are analysed as opposed to common-complex disorders. IN TRO D UCTIO NNext generation sequencing (NGS) and other high throughput technologies have brought new challenges concomitantly. The colossal amount of information that is produced has led researchers to look for ways of reducing the time and effort it takes to analyse the resulting data whilst also keeping up with the storage needs of the resulting files -which are in the magnitude of gigabytes each. The recently emerged variant call format (VCF) has somewhat provided a way out of this complex issue [1]. Using a reference sequence and comparing it with the query sequence, only the differences between the two are encoded into a VCF file. Not only are VCF files substantially smaller in size (>300x in relation to BAM files which store all raw read alignments), they also make the data relatively easy to analyse since there are many bioinformatics tools (e.g. annotation, mutation effect prediction) which accept the VCF format as standard input. The Genome Analysis Toolkit (GATK) made available by the Broad Institute also provides useful suggestions to bring a universal standard for the annotation and filtering of VCF files [2]. The abovementioned reasons have made VCF the established format for the sharing of genetic variation produced from large sequencing projects (e.g. 1000 Genomes Project, NHLBI Exome Project -also known as EVS). However the VCF does have some disadvantages. The files can be information dense, initially . CC-BY 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/011130 doi: bioRxiv preprint first posted online Nov. 6, 2014; 2 difficult to understand and parse. Comprehensive information about the VCF and its companion software VFCtools [1] are available online (vcftools.sourceforge.net).Because of the substantial decrease in the price o...
Objective: To evaluate the association between Y chromosome and mitochondrial DNA haplogroups and a number of sexually-dimorphic behavioural and psychiatric traits. Methods: The study sample included 4,211 males and 4,009 females with mitochondrial DNA haplogroups and 4,788 males with Y chromosome haplogroups who are part of the Avon Longitudinal Study of Parents and Children (ALSPAC). Different subsets of these populations were assessed using the Developmental and Well-being Assessment (DAWBA), Strengths and Difficulties Questionnaire (SDQ), SCDC (Social and Communication Disorder Checklist) and Psychotic Like Symptom Interview (PLIKSi) as measures of behavioural and psychiatric traits. Logistic regression was used to measure the association between haplogroups and the traits above. Results:We found that the majority of behavioural traits in our cohort differed between males and females. However, Y chromosome and mitochondrial DNA major haplogroups were not associated with any of the variables. In addition, secondary analyses of Y chromosome and mitochondrial DNA subgroups also showed no association.Conclusion: Y chromosome and mitochondrial DNA haplogroups are not associated with behavioural and psychiatric traits in a sample representative of the UK population.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.