Sage M. Wright scite author profile

Accurately predicting chromatin loops from genome-wide interaction matrices such as Hi-C data is critical to deepening our understanding of proper gene regulation. Current approaches are mainly focused on searching for statistically enriched dots on a genome-wide map. However, given the availability of orthogonal data types such as ChIA-PET, HiChIP, Capture Hi-C, and high-throughput imaging, a supervised learning approach could facilitate the discovery of a comprehensive set of chromatin interactions. Here, we present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps. We compare Peakachu with current enrichment-based approaches, and find that Peakachu identifies a unique set of short-range interactions. We show that our models perform well in different platforms, across different sequencing depths, and across different species. We apply this framework to predict chromatin loops in 56 Hi-C datasets, and release the results at the 3D Genome Browser.

show abstract

A supervised learning framework for chromatin loop detection in genome-wide contact maps

Salameh

Wang

Song

et al. 2019

Preprint

View full text Add to dashboard Cite

Accurately predicting chromatin loops from genome-wide interaction matrices such as Hi-C data is critical to deepen our understanding of proper gene regulation events. Current approaches are mainly focused on searching for statistically enriched dots on a genome-wide map. However, given the availability of a wide variety of orthogonal data types such as ChIA-PET, GAM, SPRITE, and high-throughput imaging, a supervised learning approach could facilitate the discovery of a comprehensive set of chromatin interactions. Here we present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps. Compared with current enrichment-based approaches, Peakachu identified more meaningful short-range interactions. We show that our models perform well in different platforms such as Hi-C, Micro-C, and DNA SPRITE, across different sequencing depths, and across different species. We applied this framework to systematically predict chromatin loops in 56 Hi-C datasets, and the results are available at the 3D Genome Browser (www.3dgenome.org). Keywords: Chromatin looping, Hi-C, machine learningconformation of chromosomes 1 . At kilobase to megabase scales, gene promoters are often connected to their distal regulatory elements, such as enhancers, through chromatin loops; rewiring of such loops has been implicated in developmental diseases and tumorigenesis 2,3 . It has been shown that chromatin loops are mediated by architectural proteins CTCF and cohesin via a loop extrusion model, where CTCF binds to a specific and non-palindromic motif in a "convergent" orientation at two sites, acting as loop anchors 4,5 . A growing number of experiments have been used to detect chromatin loops. Hi-C 6 , a high-throughput derivative of Chromosome Conformation Capture (3C) 7 , quantifies contacts between all possible pairs of genomic loci using a proximity-ligation procedure. With an improved experimental protocol and deep sequencing, in-situ Hi-C 8 makes it possible to detect loops at kilobases. By introducing micrococcal nuclease for chromatin fragmentation instead of restriction enzymes, Micro-C 9 further enables nucleosome-resolution analysis of chromatin interactions. Proximity-ligation techniques also include ChIA-PET 10 , PLAC-Seq 11 , and HiChIP 12 , which detect loops bound to target proteins through chromatin immunoprecipitation steps, and include Capture C 13 and Capture Hi-C 14 , which enrich interactions with a given set of sequences. Recently, several ligation-free techniques emerged to measure different aspects of chromatin organization. Genome Architecture Mapping (GAM) 15 quantifies chromatin contacts by sequencing DNA from a set of ultrathin nuclear sections at random orientations. Trac-looping 16 captures multiscale contacts by inserting a transposon linker between interacting regions. DNA SPRITE 17 follows a split-pool procedure to assign unique barcodes to individual complexes, with read pairs sharing identical barcodes treated similarly to contacts in Hi-C. Besides these biome...

show abstract

Human viruses have codon usage biases that match highly expressed proteins in the tissues they infect

Miller¹,

Hippen²,

Wright³

et al. 2017

Biomed Genet Genomics

View full text Add to dashboard Cite

It is well-documented that codon usage biases affect gene translational efficiency; however, it is less known if viruses share their host's codon usage motifs. We determined that human-infecting viruses share similar codon usage biases as proteins that are expressed in tissues the viruses infect. By performing 7,052,621 pairwise comparisons of genes from humans versus genes from 113 viruses that infect humans, we determined which codon usage motifs were most highly correlated. We found that 16 viruses averaged a significant correlation in codon usage with over 500 human genes per viral gene, 58 viruses were highly correlated with an average of at least 100 human genes per viral gene, and 37 viruses were significantly correlated with an average of at least one human gene per viral gene at an alpha level of 7.09 x (0.05 alpha / 7,052,621 comparisons). Only two viruses were not highly correlated with an average of one human gene per viral gene. While relatively few of the interactions were previously documented, the high statistical correlations suggest that researchers may be able to determine which tissues a virus is most likely to infect by analyzing codon usage biases.

show abstract

Association study of rs3846662 with Alzheimer's disease in a population-based cohort: the Cache County Study

Wright

Jensen

Cockriel

et al. 2019

Neurobiology of Aging

View full text Add to dashboard Cite

Genome Sequence of Leuconostoc citreum DmW_111, Isolated from Wild Drosophila

et al. 2017

View full text Add to dashboard Cite

show abstract

P3‐100: Association of Rs3846662 as a Genetic Modifier for Alzheimer's Disease: The Cache County Study

Wright

Jensen

Cockriel

et al. 2018

Alzheimer's & Dementia

View full text Add to dashboard Cite

shows low diversity is a risk factor for YOAD (Figure 2B, Chisquare test p¼0.04). After adjusting for APOE genotypes and gender, high SH (i.e. low heterozygosity) remained a significant risk factor for YOAD (p-value ¼ 0.04, O.R.¼ 5.1 for every 1% increase of SH). Two pathways with FDR < 0.1 were the Heterotrimeric G-protein signaling pathway (p-value¼0.0017) and the Notch signaling pathway (p-value¼ 0.0038), but genetic diversity of these two classical AD pathways was not associated with age of onset (p-value¼0.33 and 0.70, respectively). Conclusions: Lower genomic diversity (reduced heterozygosity) was associated with enhanced risk for nonfamilial YOAD.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sage M. Wright

A supervised learning framework for chromatin loop detection in genome-wide contact maps

A supervised learning framework for chromatin loop detection in genome-wide contact maps

Human viruses have codon usage biases that match highly expressed proteins in the tissues they infect

Association study of rs3846662 with Alzheimer's disease in a population-based cohort: the Cache County Study

Genome Sequence of Leuconostoc citreum DmW_111, Isolated from Wild Drosophila

P3‐100: Association of Rs3846662 as a Genetic Modifier for Alzheimer's Disease: The Cache County Study

Contact Info

Product

Resources

About