Snehal V Sambare scite author profile

Snehal V Sambare

2Publications

4Citation Statements Received

50Citation Statements Given

How they've been cited

How they cite others

Affiliations

Institute of Mathematical Sciences, George Mason University

Publications

Order By: Most citations

THiCweed: fast, sensitive detection of sequence features by clustering big datasets

Agrawal

Sambare

Narlikar

et al. 2017

View full text Add to dashboard Cite

We present THiCweed, a new approach to analyzing transcription factor binding data from high-throughput chromatin immunoprecipitation-sequencing (ChIP-seq) experiments. THiCweed clusters bound regions based on sequence similarity using a divisive hierarchical clustering approach based on sequence similarity within sliding windows, while exploring both strands. ThiCweed is specially geared toward data containing mixtures of motifs, which present a challenge to traditional motif-finders. Our implementation is significantly faster than standard motif-finding programs, able to process 30 000 peaks in 1–2 h, on a single CPU core of a desktop computer. On synthetic data containing mixtures of motifs it is as accurate or more accurate than all other tested programs. THiCweed performs best with large ‘window’ sizes (≥50 bp), much longer than typical binding sites (7–15 bp). On real data it successfully recovers literature motifs, but also uncovers complex sequence characteristics in flanking DNA, variant motifs and secondary motifs even when they occur in <5% of the input, all of which appear biologically relevant. We also find recurring sequence patterns across diverse ChIP-seq datasets, possibly related to chromatin architecture and looping. THiCweed thus goes beyond traditional motif finding to give new insights into genomic transcription factor-binding complexity.

show abstract

THiCweed: fast, sensitive detection of sequence features by clustering big data sets

Agrawal

Sambare

Narlikar

et al. 2017

Preprint

View full text Add to dashboard Cite

We present THiCweed, a new approach to analyzing transcription factor binding data from high-throughput chromatin-immunoprecipitation-sequencing (ChIP-seq) experiments. THiCweed clusters bound regions based on sequence similarity using a divisive hierarchical clustering approach based on sequence similarity within sliding windows, while exploring both strands. ThiCweed is specially geared towards data containing mixtures of motifs, which present a challenge to traditional motif-finders. Our implementation is significantly faster than standard motif-finding programs, able to process 30,000 peaks in 1-2 hours, on a single CPU core of a desktop computer. On synthetic data containing mixtures of motifs it is as accurate or more accurate than all other tested programs.THiCweed performs best with large "window" sizes (≥ 50bp), much longer than typical binding sites (7-15 base pairs). On real data it successfully recovers literature motifs, but also uncovers complex sequence characteristics in flanking DNA, variant motifs, and secondary motifs even when they occur in < 5% of the input, all of which appear biologically relevant. We also find recurring sequence patterns across diverse ChIPseq data sets, possibly related to chromatin architecture and looping. THiCweed thus goes beyond traditional motif-finding to give new insights into genomic TF binding complexity.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Snehal V Sambare

THiCweed: fast, sensitive detection of sequence features by clustering big datasets

THiCweed: fast, sensitive detection of sequence features by clustering big data sets

Contact Info

Product

Resources

About