Accurately predicting chromatin loops from genome-wide interaction matrices such as Hi-C data is critical to deepening our understanding of proper gene regulation. Current approaches are mainly focused on searching for statistically enriched dots on a genome-wide map. However, given the availability of orthogonal data types such as ChIA-PET, HiChIP, Capture Hi-C, and high-throughput imaging, a supervised learning approach could facilitate the discovery of a comprehensive set of chromatin interactions. Here, we present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps. We compare Peakachu with current enrichment-based approaches, and find that Peakachu identifies a unique set of short-range interactions. We show that our models perform well in different platforms, across different sequencing depths, and across different species. We apply this framework to predict chromatin loops in 56 Hi-C datasets, and release the results at the 3D Genome Browser.
Accurately predicting chromatin loops from genome-wide interaction matrices such as Hi-C data is critical to deepen our understanding of proper gene regulation events. Current approaches are mainly focused on searching for statistically enriched dots on a genome-wide map. However, given the availability of a wide variety of orthogonal data types such as ChIA-PET, GAM, SPRITE, and high-throughput imaging, a supervised learning approach could facilitate the discovery of a comprehensive set of chromatin interactions. Here we present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps. Compared with current enrichment-based approaches, Peakachu identified more meaningful short-range interactions. We show that our models perform well in different platforms such as Hi-C, Micro-C, and DNA SPRITE, across different sequencing depths, and across different species. We applied this framework to systematically predict chromatin loops in 56 Hi-C datasets, and the results are available at the 3D Genome Browser (www.3dgenome.org). Keywords: Chromatin looping, Hi-C, machine learningconformation of chromosomes 1 . At kilobase to megabase scales, gene promoters are often connected to their distal regulatory elements, such as enhancers, through chromatin loops; rewiring of such loops has been implicated in developmental diseases and tumorigenesis 2,3 . It has been shown that chromatin loops are mediated by architectural proteins CTCF and cohesin via a loop extrusion model, where CTCF binds to a specific and non-palindromic motif in a "convergent" orientation at two sites, acting as loop anchors 4,5 . A growing number of experiments have been used to detect chromatin loops. Hi-C 6 , a high-throughput derivative of Chromosome Conformation Capture (3C) 7 , quantifies contacts between all possible pairs of genomic loci using a proximity-ligation procedure. With an improved experimental protocol and deep sequencing, in-situ Hi-C 8 makes it possible to detect loops at kilobases. By introducing micrococcal nuclease for chromatin fragmentation instead of restriction enzymes, Micro-C 9 further enables nucleosome-resolution analysis of chromatin interactions. Proximity-ligation techniques also include ChIA-PET 10 , PLAC-Seq 11 , and HiChIP 12 , which detect loops bound to target proteins through chromatin immunoprecipitation steps, and include Capture C 13 and Capture Hi-C 14 , which enrich interactions with a given set of sequences. Recently, several ligation-free techniques emerged to measure different aspects of chromatin organization. Genome Architecture Mapping (GAM) 15 quantifies chromatin contacts by sequencing DNA from a set of ultrathin nuclear sections at random orientations. Trac-looping 16 captures multiscale contacts by inserting a transposon linker between interacting regions. DNA SPRITE 17 follows a split-pool procedure to assign unique barcodes to individual complexes, with read pairs sharing identical barcodes treated similarly to contacts in Hi-C. Besides these biome...
It is well-documented that codon usage biases affect gene translational efficiency; however, it is less known if viruses share their host's codon usage motifs. We determined that human-infecting viruses share similar codon usage biases as proteins that are expressed in tissues the viruses infect. By performing 7,052,621 pairwise comparisons of genes from humans versus genes from 113 viruses that infect humans, we determined which codon usage motifs were most highly correlated. We found that 16 viruses averaged a significant correlation in codon usage with over 500 human genes per viral gene, 58 viruses were highly correlated with an average of at least 100 human genes per viral gene, and 37 viruses were significantly correlated with an average of at least one human gene per viral gene at an alpha level of 7.09 x (0.05 alpha / 7,052,621 comparisons). Only two viruses were not highly correlated with an average of one human gene per viral gene. While relatively few of the interactions were previously documented, the high statistical correlations suggest that researchers may be able to determine which tissues a virus is most likely to infect by analyzing codon usage biases.
Isolates of the lactic acid bacterium Leuconostoc citreum are a major part of fermentation processes, especially in Korean kimchi. Here, we present the genome of L. citreum DmW_111, isolated from wild Drosophila melanogaster; analysis of this genome will expand the diversity of genome sequences for non-Lactobacillus spp. isolated from D. melanogaster.
shows low diversity is a risk factor for YOAD (Figure 2B, Chisquare test p¼0.04). After adjusting for APOE genotypes and gender, high SH (i.e. low heterozygosity) remained a significant risk factor for YOAD (p-value ¼ 0.04, O.R.¼ 5.1 for every 1% increase of SH). Two pathways with FDR < 0.1 were the Heterotrimeric G-protein signaling pathway (p-value¼0.0017) and the Notch signaling pathway (p-value¼ 0.0038), but genetic diversity of these two classical AD pathways was not associated with age of onset (p-value¼0.33 and 0.70, respectively). Conclusions: Lower genomic diversity (reduced heterozygosity) was associated with enhanced risk for nonfamilial YOAD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.