Liver cytochrome P450s (P450s) play critical roles in drug metabolism, toxicology, and metabolic processes. Despite rapid progress in the understanding of these enzymes, a systematic investigation of the full spectrum of functionality of individual P450s, the interrelationship or networks connecting them, and the genetic control of each gene/enzyme is lacking. To this end, we genotyped, expression-profiled, and measured P450 activities of 466 human liver samples and applied a systems biology approach via the integration of genetics, gene expression, and enzyme activity measurements. We found that most P450s were positively correlated among themselves and were highly correlated with known regulators as well as thousands of other genes enriched for pathways relevant to the metabolism of drugs, fatty acids, amino acids, and steroids. Genome-wide association analyses between genetic polymorphisms and P450 expression or enzyme activities revealed sets of SNPs associated with P450 traits, and suggested the existence of both cis-regulation of P450 expression (especially for CYP2D6) and more complex trans-regulation of P450 activity. Several novel SNPs associated with CYP2D6 expression and enzyme activity were validated in an independent human cohort. By constructing a weighted coexpression network and a Bayesian regulatory network, we defined the human liver transcriptional network structure, uncovered subnetworks representative of the P450 regulatory system, and identified novel candidate regulatory genes, namely, EHHADH, SLC10A1, and AKR1D1. The P450 subnetworks were then validated using gene signatures responsive to ligands of known P450 regulators in mouse and rat. This systematic survey provides a comprehensive view of the functionality, genetic control, and interactions of P450s.
Genome-wide association studies (GWAS) provide an important approach to identifying common genetic variants that predispose to human disease. A typical GWAS may genotype hundreds of thousands of single nucleotide polymorphisms (SNPs) located throughout the human genome in a set of cases and controls. Logistic regression is often used to test for association between a SNP genotype and case versus control status, with corresponding odds ratios (ORs) typically reported only for those SNPs meeting selection criteria. However, when these estimates are based on the original data used to detect the variant, the results are affected by a selection bias sometimes referred to the "winner's curse" (Capen and others, 1971). The actual genetic association is typically overestimated. We show that such selection bias may be severe in the sense that the conditional expectation of the standard OR estimator may be quite far away from the underlying parameter. Also standard confidence intervals (CIs) may have far from the desired coverage rate for the selected ORs. We propose and evaluate 3 bias-reduced estimators, and also corresponding weighted estimators that combine corrected and uncorrected estimators, to reduce selection bias. Their corresponding CIs are also proposed. We study the performance of these estimators using simulated data sets and show that they reduce the bias and give CI coverage close to the desired level under various scenarios, even for associations having only small statistical power.
Genome-wide association studies (GWAS) have achieved great success identifying common genetic variants associated with common human diseases. However, to date, the massive amounts of data generated from GWAS have not been maximally leveraged and integrated with other types of data to identify associations beyond those associations that meet the stringent genome-wide significance threshold. Here, we present a novel approach that leverages information from genetics of gene expression studies to identify biological pathways enriched for expression-associated genetic loci associated with disease in publicly available GWAS results. Specifically, we first identify SNPs in population-based human cohorts that associate with the expression of genes (eSNPs) in the metabolically active tissues liver, subcutaneous adipose, and omental adipose. We then use this functionally annotated set of SNPs to investigate pathways enriched for eSNPs associated with disease in publicly available GWAS data. As an example, we tested 110 pathways from the Kyoto Encylopedia of Genes and Genomes (KEGG) database and identified 16 pathways enriched for genes corresponding to eSNPs that show evidence of association with type 2 diabetes (T2D) in the Wellcome Trust Case Control Consortium (WTCCC) T2D GWAS. We then replicated these findings in the Diabetes Genetics Replication and Meta-analysis (DIAGRAM) study. Many of the pathways identified have been proposed as important candidate pathways for T2D, including the calcium signaling pathway, the PPAR signaling pathway, and TGF-beta signaling. Importantly, we identified other pathways not previously associated with T2D, including the tight junction, complement and coagulation pathway, and antigen processing and presentation pathway. The integration of pathways and eSNPs provides putative functional bridges between GWAS and candidate genes or pathways, thus serving as a potential powerful approach to identifying biological mechanisms underlying GWAS findings.
Genome-wide association studies (GWAS) have demonstrated the ability to identify the strongest causal common variants in complex human diseases. However, to date, the massive data generated from GWAS have not been maximally explored to identify true associations that fail to meet the stringent level of association required to achieve genome-wide significance. Genetics of gene expression (GGE) studies have shown promise towards identifying DNA variations associated with disease and providing a path to functionally characterize findings from GWAS. Here, we present the first empiric study to systematically characterize the set of single nucleotide polymorphisms associated with expression (eSNPs) in liver, subcutaneous fat, and omental fat tissues, demonstrating these eSNPs are significantly more enriched for SNPs that associate with type 2 diabetes (T2D) in three large-scale GWAS than a matched set of randomly selected SNPs. This enrichment for T2D association increases as we restrict to eSNPs that correspond to genes comprising gene networks constructed from adipose gene expression data isolated from a mouse population segregating a T2D phenotype. Finally, by restricting to eSNPs corresponding to genes comprising an adipose subnetwork strongly predicted as causal for T2D, we dramatically increased the enrichment for SNPs associated with T2D and were able to identify a functionally related set of diabetes susceptibility genes. We identified and validated malic enzyme 1 (Me1) as a key regulator of this T2D subnetwork in mouse and provided support for the association of this gene to T2D in humans. This integration of eSNPs and networks provides a novel approach to identify disease susceptibility networks rather than the single SNPs or genes traditionally identified through GWAS, thereby extracting additional value from the wealth of data currently being generated by GWAS.
Environmental exposures filtered through the genetic make-up of each individual alter the transcriptional repertoire in organs central to metabolic homeostasis, thereby affecting arterial lipid accumulation, inflammation, and the development of coronary artery disease (CAD). The primary aim of the Stockholm Atherosclerosis Gene Expression (STAGE) study was to determine whether there are functionally associated genes (rather than individual genes) important for CAD development. To this end, two-way clustering was used on 278 transcriptional profiles of liver, skeletal muscle, and visceral fat (n = 66/tissue) and atherosclerotic and unaffected arterial wall (n = 40/tissue) isolated from CAD patients during coronary artery bypass surgery. The first step, across all mRNA signals (n = 15,042/12,621 RefSeqs/genes) in each tissue, resulted in a total of 60 tissue clusters (n = 3958 genes). In the second step (performed within tissue clusters), one atherosclerotic lesion (n = 49/48) and one visceral fat (n = 59) cluster segregated the patients into two groups that differed in the extent of coronary stenosis (P = 0.008 and P = 0.00015). The associations of these clusters with coronary atherosclerosis were validated by analyzing carotid atherosclerosis expression profiles. Remarkably, in one cluster (n = 55/54) relating to carotid stenosis (P = 0.04), 27 genes in the two clusters relating to coronary stenosis were confirmed (n = 16/17, P<10−27and−30). Genes in the transendothelial migration of leukocytes (TEML) pathway were overrepresented in all three clusters, referred to as the atherosclerosis module (A-module). In a second validation step, using three independent cohorts, the A-module was found to be genetically enriched with CAD risk by 1.8-fold (P<0.004). The transcription co-factor LIM domain binding 2 (LDB2) was identified as a potential high-hierarchy regulator of the A-module, a notion supported by subnetwork analysis, by cellular and lesion expression of LDB2, and by the expression of 13 TEML genes in Ldb2–deficient arterial wall. Thus, the A-module appears to be important for atherosclerosis development and, together with LDB2, merits further attention in CAD research.
Tibetans live on the highest plateau in the world, their current population size is approximately 5 million, and most of them live at an altitude exceeding 3,500 m. Therefore, the Tibetan Plateau is a remarkable area for cultural and biological studies of human population history. However, the chronological profile of the Tibetan Plateau's colonization remains an unsolved question of human prehistory. To reconstruct the prehistoric colonization and demographic history of modern humans on the Tibetan Plateau, we systematically sampled 6,109 Tibetan individuals from 41 geographic populations across the entire region of the Tibetan Plateau and analyzed the phylogeographic patterns of both paternal (n = 2,354) and maternal (n = 6,109) lineages as well as genome-wide single nucleotide polymorphism markers (n = 50) in Tibetan populations. We found that there have been two distinct, major prehistoric migrations of modern humans into the Tibetan Plateau. The first migration was marked by ancient Tibetan genetic signatures dated to approximately 30,000 years ago, indicating that the initial peopling of the Tibetan Plateau by modern humans occurred during the Upper Paleolithic rather than Neolithic. We also found evidences for relatively young (only 7-10 thousand years old) shared Y chromosome and mitochondrial DNA haplotypes between Tibetans and Han Chinese, suggesting a second wave of migration during the early Neolithic. Collectively, the genetic data indicate that Tibetans have been adapted to a high altitude environment since initial colonization of the Tibetan Plateau in the early Upper Paleolithic, before the last glacial maximum, followed by a rapid population expansion that coincided with the establishment of farming and yak pastoralism on the Plateau in the early Neolithic.
BackgroundThe phylogeography of the Y chromosome in Asia previously suggested that modern humans of African origin initially settled in mainland southern East Asia, and about 25,000–30,000 years ago, migrated northward, spreading throughout East Asia. However, the fragmented distribution of one East Asian specific Y chromosome lineage (D-M174), which is found at high frequencies only in Tibet, Japan and the Andaman Islands, is inconsistent with this scenario.ResultsIn this study, we collected more than 5,000 male samples from 73 East Asian populations and reconstructed the phylogeography of the D-M174 lineage. Our results suggest that D-M174 represents an extremely ancient lineage of modern humans in East Asia, and a deep divergence was observed between northern and southern populations.ConclusionWe proposed that D-M174 has a southern origin and its northward expansion occurred about 60,000 years ago, predating the northward migration of other major East Asian lineages. The Neolithic expansion of Han culture and the last glacial maximum are likely the key factors leading to the current relic distribution of D-M174 in East Asia. The Tibetan and Japanese populations are the admixture of two ancient populations represented by two major East Asian specific Y chromosome lineages, the O and D haplogroups.
Genetic diversity data, from Y chromosome and mitochondrial DNA as well as recent genome-wide autosomal single nucleotide polymorphisms, suggested that mainland Southeast Asia was the major geographic source of East Asian populations. However, these studies also detected Central-South Asia (CSA)- and/or West Eurasia (WE)-related genetic components in East Asia, implying either recent population admixture or ancient migrations via the proposed northern route. To trace the time period and geographic source of these CSA- and WE-related genetic components, we sampled 3,826 males (116 populations from China and 1 population from North Korea) and performed high-resolution genotyping according to the well-resolved Y chromosome phylogeny. Our data, in combination with the published East Asian Y-haplogroup data, show that there are four dominant haplogroups (accounting for 92.87% of the East Asian Y chromosomes), O-M175, D-M174, C-M130 (not including C5-M356), and N-M231, in both southern and northern East Asian populations, which is consistent with the proposed southern route of modern human origin in East Asia. However, there are other haplogroups (6.79% in total) (E-SRY4064, C5-M356, G-M201, H-M69, I-M170, J-P209, L-M20, Q-M242, R-M207, and T-M70) detected primarily in northern East Asian populations and were identified as Central-South Asian and/or West Eurasian origin based on the phylogeographic analysis. In particular, evidence of geographic distribution and Y chromosome short tandem repeat (Y-STR) diversity indicates that haplogroup Q-M242 (the ancestral haplogroup of the native American-specific haplogroup Q1a3a-M3) and R-M207 probably migrated into East Asia via the northern route. The age estimation of Y-STR variation within haplogroups suggests the existence of postglacial (∼18 Ka) migrations via the northern route as well as recent (∼3 Ka) population admixture. We propose that although the Paleolithic migrations via the southern route played a major role in modern human settlement in East Asia, there are ancient contributions, though limited, from WE, which partly explain the genetic divergence between current southern and northern East Asian populations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.