Transcription factors (TF) regulate gene expression by binding to specific sequences known as motifs. A bottleneck in our knowledge of gene regulation is the lack of functional characterization of TF motifs, which is mainly due to the large number of predicted TF motifs, and tissue specificity of TF binding. We built a framework to identify tissue-specific functional motifs (funMotifs) across the genome based on thousands of annotation tracks obtained from large-scale genomics projects including ENCODE, RoadMap Epigenomics and FANTOM. The annotations were weighted using a logistic regression model trained on regulatory elements obtained from massively parallel reporter assays. Overall, genome-wide predicted motifs of 519 TFs were characterized across fifteen tissue types. funMotifs summarizes the weighted annotations into a functional activity score for each of the predicted motifs. funMotifs enabled us to measure tissue specificity of different TFs and to identify candidate functional variants in TF motifs from the 1000 genomes project, the GTEx project, the GWAS catalogue, and in 2,515 cancer samples from the Pan-cancer analysis of whole genome sequences (PCAWG) cohort. To enable researchers annotate genomic variants or regions of interest, we have implemented a command-line pipeline and a web-based interface that can publicly be accessed on: http://bioinf.icm.uu.se/funmotifs.
BackgroundThe underlying strategies used by influenza A viruses (IAVs) to adapt to new hosts while crossing the species barrier are complex and yet to be understood completely. Several studies have been published identifying singular genomic signatures that indicate such a host switch. The complexity of the problem suggested that in addition to the singular signatures, there might be a combinatorial use of such genomic features, in nature, defining adaptation to hosts.ResultsWe used computational rule-based modeling to identify combinatorial sets of interacting amino acid (aa) residues in 12 proteins of IAVs of H1N1 and H3N2 subtypes. We built highly accurate rule-based models for each protein that could differentiate between viral aa sequences coming from avian and human hosts. We found 68 host-specific combinations of aa residues, potentially associated to host adaptation on HA, M1, M2, NP, NS1, NEP, PA, PA-X, PB1 and PB2 proteins of the H1N1 subtype and 24 on M1, M2, NEP, PB1 and PB2 proteins of the H3N2 subtypes. In addition to these combinations, we found 132 novel singular aa signatures distributed among all proteins, including the newly discovered PA-X protein, of both subtypes. We showed that HA, NA, NP, NS1, NEP, PA-X and PA proteins of the H1N1 subtype carry H1N1-specific and HA, NA, PA-X, PA, PB1-F2 and PB1 of the H3N2 subtype carry H3N2-specific signatures. M1, M2, PB1-F2, PB1 and PB2 of H1N1 subtype, in addition to H1N1 signatures, also carry H3N2 signatures. Similarly M1, M2, NP, NS1, NEP and PB2 of H3N2 subtype were shown to carry both H3N2 and H1N1 host-specific signatures (HSSs).ConclusionsTo sum it up, we computationally constructed simple IF-THEN rule-based models that could distinguish between aa sequences of avian and human IAVs. From the rules we identified HSSs having a potential to affect the adaptation to specific hosts. The identification of combinatorial HSSs suggests that the process of adaptation of IAVs to a new host is more complex than previously suggested. The present study provides a basis for further detailed studies with the aim to elucidate the molecular mechanisms providing the foundation for the adaptation process.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2919-4) contains supplementary material, which is available to authorized users.
BackgroundPolybasic cleavage sites of the hemagglutinin (HA) proteins are considered to be the most important determinants indicating virulence of the avian influenza viruses (AIV). However, evidence is accumulating that these sites alone are not sufficient to establish high pathogenicity. There need to exist other sites located on the HA protein outside the cleavage site or on the other proteins expressed by AIV that contribute to the pathogenicity.ResultsWe employed rule-based computational modeling to construct a map, with high statistical significance, of amino acid (AA) residues associated to pathogenicity in 11 proteins of the H5 type viruses. We found potential markers of pathogenicity in all of the 11 proteins expressed by the H5 type of AIV. AA mutations S-43HA1-D, D-83HA1-A in HA; S-269-D, E-41-H in NA; S-48-N, K-212-N in NS1; V-166-A in M1; G-14-E in M2; K-77-R, S-377-N in NP; and Q-48-P in PB1-F2 were identified as having a potential to shift the pathogenicity from low to high. Our results suggest that the low pathogenicity is common to most of the subtypes of the H5 AIV while the high pathogenicity is specific to each subtype. The models were developed using public data and validated on new, unseen sequences.ConclusionsOur models explicitly define a viral genetic background required for the virus to be highly pathogenic and thus confirm the hypothesis of the presence of pathogenicity markers beyond the cleavage site.Electronic supplementary materialThe online version of this article (doi:10.1186/s12866-015-0465-x) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.