SummaryMany common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.
Co-evolution is a fundamental component of the theory of evolution and is essential for understanding the relationships between species in complex ecological networks. A wide range of co-evolution-inspired computational methods has been designed to predict molecular interactions, but it is only recently that important advances have been made. Breakthroughs in the handling of phylogenetic information and in disentangling indirect relationships have resulted in an improved capacity to predict interactions between proteins and contacts between different protein residues. Here, we review the main co-evolution-based computational approaches, their theoretical basis, potential applications and foreseeable developments.
Human nervous system development is an intricate and protracted process that requires precise spatio-temporal transcriptional regulation. Here we generated tissue-level and single-cell transcriptomic data from up to sixteen brain regions covering prenatal and postnatal rhesus macaque development. Integrative analysis with complementary human data revealed that global intra-species (ontogenetic) and inter-species (phylogenetic) regional transcriptomic differences exhibit concerted cup-shaped patterns, with a late fetal-to-infancy (perinatal) convergence. Prenatal neocortical transcriptomic patterns revealed transient topographic gradients, whereas postnatal patterns largely reflected functional hierarchy. Genes exhibiting heterotopic and heterochronic divergence included those transiently enriched in the prenatal prefrontal cortex or linked to autism spectrum disorder and schizophrenia. Our findings shed light on transcriptomic programs underlying the evolution of human brain development and the pathogenesis of neuropsychiatric disorders.
Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.
BackgroundMicroRNA regulate mRNA levels in a tissue specific way, either by inducing degradation of the transcript or by inhibiting translation or transcription. Putative mRNA targets of microRNA identified from seed sequence matches are available in many databases. However, such matches have a high false positive rate and cannot identify tissue specificity of regulation.ResultsWe describe a simple method to identify direct mRNA targets of microRNA dysregulated in cancers from expression level measurements in patient matched tumor/normal samples. The word "direct" is used here in a strict sense to: a) represent mRNA which have an exact seed sequence match to the microRNA in their 3'UTR, b) the seed sequence match is strictly conserved across mouse, human, rat and dog genomes, c) the mRNA and microRNA expression levels can distinguish tumor from normal with high significance and d) the microRNA/mRNA expression levels are strongly and significantly anti-correlated in tumor and/or normal samples. We apply and validate the method using clear cell Renal Cell Carcinoma (ccRCC) and matched normal kidney samples, limiting our analysis to mRNA targets which undergo degradation of the mRNA transcript because of a perfect seed sequence match. Dysregulated microRNA and mRNA are first identified by comparing their expression levels in tumor vs normal samples. Putative dysregulated microRNA/mRNA pairs are identified from these using seed sequence matches, requiring that the seed sequence be conserved in human/dog/rat/mouse genomes. These are further pruned by requiring a strong anti-correlation signature in tumor and/or normal samples. The method revealed many new regulations in ccRCC. For instance, loss of miR-149, miR-200c and mir-141 causes gain of function of oncogenes (KCNMA1, LOX), VEGFA and SEMA6A respectively and increased levels of miR-142-3p, miR-185, mir-34a, miR-224, miR-21 cause loss of function of tumor suppressors LRRC2, PTPN13, SFRP1, ERBB4, and (SLC12A1, TCF21) respectively. We also found strong anti-correlation between VEGFA and the miR-200 family of microRNA: miR-200a*, 200b, 200c and miR-141. Several identified microRNA/mRNA pairs were validated on an independent set of matched ccRCC/normal samples. The regulation of SEMA6A by miR-141 was verified by a transfection assay.ConclusionsWe describe a simple and reliable method to identify direct gene targets of microRNA in any cancer. The constraints we impose (strong dysregulation signature for microRNA and mRNA levels between tumor/normal samples, evolutionary conservation of seed sequence and strong anti-correlation of expression levels) remove spurious matches and identify a subset of robust, tissue specific, functional mRNA targets of dysregulated microRNA.
Interacting or functionally related protein families tend to have similar phylogenetic trees. Based on this observation, techniques have been developed to predict interaction partners. The observed degree of similarity between the phylogenetic trees of two proteins is the result of many different factors besides the actual interaction or functional relationship between them. Such factors influence the performance of interaction predictions. One aspect that can influence this similarity is related to the fact that a given protein interacts with many others, and hence it must adapt to all of them. Accordingly, the interaction or coadaptation signal within its tree is a composite of the influence of all of the interactors. Here, we introduce a new estimator of coevolution to overcome this and other problems. Instead of relying on the individual value of tree similarity between two proteins, we use the whole network of similarities between all of the pairs of proteins within a genome to reassess the similarity of that pair, thereby taking into account its coevolutionary context. We show that this approach offers a substantial improvement in interaction prediction performance, providing a degree of accuracy/coverage comparable with, or in some cases better than, that of experimental techniques. Moreover, important information on the structure, function, and evolution of macromolecular complexes can be inferred with this methodology.coevolution ͉ interaction ͉ mirrortree C oevolution is a well characterized process that takes place at all biological levels, from ecosystems to molecules. Coevolution between interacting protein families had been proposed for some cases based on the qualitatively observed similarity of their phylogenetic trees (1, 2). This tree similarity was later quantified and statistically demonstrated to be related to protein interactions in large datasets of interacting families (3, 4). This ''mirrortree'' approach has been followed by many authors, who have developed different extensions of the method. Many of these extensions have been aimed at correcting factors that influence tree similarity but that are not related with the interaction, thereby affecting the predictive performance of this technique. For example, an obvious extension has been the inclusion of information on the phylogeny of the organisms involved to correct for the ''background similarity'' expected for any pair of trees resulting from the underlying speciation events (5, 6).Still, there are many other factors affecting the relationship between interactions and tree topology. Maybe one of the most important is related to the fact that a protein is coevolving with many interactors simultaneously. This would make it difficult to separate the effect of each of them on the topology of the tree. Nevertheless, all of the methods developed to date consider the pairs as isolated when evaluating their coevolution. Moreover, methods for predicting protein interactions based on tree similarities are prone to errors from several sources (e...
Chemokines coordinate leukocyte trafficking by promoting oligomerization and signaling by G protein-coupled receptors; however, it is not known which amino acid residues of the receptors participate in this process. Bioinformatic analysis predicted that Ile52 in transmembrane region-1 (TM1) and Val150 in TM4 of the chemokine receptor CCR5 are key residues in the interaction surface between CCR5 molecules. Mutation of these residues generated nonfunctional receptors that could not dimerize or trigger signaling. In vitro and in vivo studies in human cell lines and primary T cells showed that synthetic peptides containing these residues blocked responses induced by the CCR5 ligand CCL5. Fluorescence resonance energy transfer showed the presence of preformed, ligand-stabilized chemokine receptor oligomers. This is the first description of the residues involved in chemokine receptor dimerization, and indicates a potential target for the modification of chemokine responses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.