2017
DOI: 10.1093/nar/gkx1251
|View full text |Cite
|
Sign up to set email alerts
|

THiCweed: fast, sensitive detection of sequence features by clustering big datasets

Abstract: We present THiCweed, a new approach to analyzing transcription factor binding data from high-throughput chromatin immunoprecipitation-sequencing (ChIP-seq) experiments. THiCweed clusters bound regions based on sequence similarity using a divisive hierarchical clustering approach based on sequence similarity within sliding windows, while exploring both strands. ThiCweed is specially geared toward data containing mixtures of motifs, which present a challenge to traditional motif-finders. Our implementation is si… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2

Relationship

4
1

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 33 publications
0
4
0
Order By: Relevance
“…Focusing on CTCF, we find that PSSVs inferred for our approach are strongest for genuine CTCF motif instances in CTCF ChIP-seq peak regions (information loss less than 10 ); weaker for scrambled CTCF motifs in CTCF ChIP-seq peak regions, and for genuine CTCF motif instances in random genomic regions (information loss 15 18 ); and weakest for scrambled CTCF motifs in random genomic regions (information loss 23 25 ). It is known that there are frequently secondary motifs and sequence features in the neighbourhood of the TF core motif [ 25 , 26 ], so it is plausible that an extended region around the core motif, including chance instances of scrambled motifs, may be under greater selection in ChIP-seq regions compared to random genomic regions. For a truly neutrally evolving region, one expects that the PSSV would be almost flat, but estimates of what fraction of the human genome is functional or under selective constraint range from 7% [ 27 ] to, controversially, 80% [ 28 ], with many authors agreeing that 10%–15% is plausible (see [ 29 ] for a discussion).…”
Section: Discussionmentioning
confidence: 99%
“…Focusing on CTCF, we find that PSSVs inferred for our approach are strongest for genuine CTCF motif instances in CTCF ChIP-seq peak regions (information loss less than 10 ); weaker for scrambled CTCF motifs in CTCF ChIP-seq peak regions, and for genuine CTCF motif instances in random genomic regions (information loss 15 18 ); and weakest for scrambled CTCF motifs in random genomic regions (information loss 23 25 ). It is known that there are frequently secondary motifs and sequence features in the neighbourhood of the TF core motif [ 25 , 26 ], so it is plausible that an extended region around the core motif, including chance instances of scrambled motifs, may be under greater selection in ChIP-seq regions compared to random genomic regions. For a truly neutrally evolving region, one expects that the PSSV would be almost flat, but estimates of what fraction of the human genome is functional or under selective constraint range from 7% [ 27 ] to, controversially, 80% [ 28 ], with many authors agreeing that 10%–15% is plausible (see [ 29 ] for a discussion).…”
Section: Discussionmentioning
confidence: 99%
“…Focussing on CTCF, we find that PSSVs inferred for our approach are strongest for genuine CTCF motif instances in CTCF ChIP-seq peak regions (information loss < 10%); weaker for scrambled CTCF motifs in CTCF ChIP-seq peak regions, and for genuine CTCF motif instances in random genomic regions (information loss ≈ 15%–18%); and weakest for scrambled CTCF motifs in random genomic regions (information loss ≈ 23%-25%). It is known that there are frequently secondary motifs and sequence features in the neighbourhood of the TF core motif [25, 26], so it is plausible that an extended region around the core motif, including chance instances of scrambled motifs, may be under greater selection in ChIP-seq regions compared to random genomic regions. For a truly neutrally evolving region, one expects that the PSSV would be almost flat, but estimates of what fraction of the human genome is functional or under selective constraint range from 7% [27] to, controversially, 80% [28], with many authors agreeing that 10%–15% is plausible (see [29] for a discussion).…”
Section: Discussionmentioning
confidence: 99%
“…Motifs may be missed because they are present only in a small set of regions and therefore not statistically overrepresented in the entire set. A few recent methods do account for this by posing this as a clustering problem, with each cluster of sequences being dominated by a potentially di↵erent motif 11,12 . However, these approaches do not take into account combinations of motifs, which may be critical to drive the biochemical activity at the region.…”
Section: Introductionmentioning
confidence: 99%