2019
DOI: 10.1186/s13059-018-1614-y
|View full text |Cite
|
Sign up to set email alerts
|

Accurate prediction of cell type-specific transcription factor binding

Abstract: Prediction of cell type-specific, in vivo transcription factor binding sites is one of the central challenges in regulatory genomics. Here, we present our approach that earned a shared first rank in the “ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge” in 2017. In post-challenge analyses, we benchmark the influence of different feature sets and find that chromatin accessibility and binding motifs are sufficient to yield state-of-the-art performance. Finally, we provide 682 lists of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
122
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 97 publications
(127 citation statements)
references
References 58 publications
3
122
0
Order By: Relevance
“…Since auPRC is better at capturing the difference of prediction performances with imbalanced labels than auROC and recalls at different FDR levels, we choose auPRC as our primary benchmarking metric. Based on auPRC, our attention model has better performance on 69.23% (9/13) of the prediction targets than Anchor (Li, et al, 2019), 69.23% (9/13) than FactorNet (Quang and Xie, 2019), 76.92% (10/13) than Catchitt (Keilwagen, et al, 2019), and 92.31% (12/13) than Cheburashka (Lando, et al, 2016). Among all the methods, our method achieved the highest auPRC on 6 targets: CTCF/induced pluripotent stem cell (iPSC), FOXA1/liver, FOXA2/liver, GABPA/liver, HNF4A/liver, and REST/liver.…”
Section: Overall Benchmarking On Evaluation Datamentioning
confidence: 99%
See 3 more Smart Citations
“…Since auPRC is better at capturing the difference of prediction performances with imbalanced labels than auROC and recalls at different FDR levels, we choose auPRC as our primary benchmarking metric. Based on auPRC, our attention model has better performance on 69.23% (9/13) of the prediction targets than Anchor (Li, et al, 2019), 69.23% (9/13) than FactorNet (Quang and Xie, 2019), 76.92% (10/13) than Catchitt (Keilwagen, et al, 2019), and 92.31% (12/13) than Cheburashka (Lando, et al, 2016). Among all the methods, our method achieved the highest auPRC on 6 targets: CTCF/induced pluripotent stem cell (iPSC), FOXA1/liver, FOXA2/liver, GABPA/liver, HNF4A/liver, and REST/liver.…”
Section: Overall Benchmarking On Evaluation Datamentioning
confidence: 99%
“…There are 51676736 bins in total on training chromosomes in the labels, resulting in 51676736×n potential training samples for each transcription factor, where n is the number of available cell types for training. Due to limited computing capacity, we use the iterative training process s with downsampling the negatives (Keilwagen, et al, 2019;Quang and Xie, 2019). In each epoch, we first sample Nneg negative bins from all negative labels.…”
Section: Deep Neural Network Models With Attention Mechanismmentioning
confidence: 99%
See 2 more Smart Citations
“…When TF ChIP-seq data is not available, TF binding motifs, used in combination with chromatin accessibility data or H3K27ac ChIP-seq data might be used to infer TF binding sites 7,20,21 . Machine learning approaches that transfer models learnt from TF ChIP-seq peaks, motifs and DNase-seq data between cell types are promising ways of imputing TF cistromes, although imputation of TF binding sites on a large scale remains to be implemented 22-27 . Computationally imputed TF binding data is expected to represent TF binding sites less accurately than TF ChIP-seq experimental data, so we sought to develop a TR prediction method that could use imputed TF cistromes effectively, along with ChIP-seq derived ones.…”
Section: Introductionmentioning
confidence: 99%