Assessing computational tools for the discovery of transcription factor binding sites

Tompa, Martin; Li, Nan; Bailey, Timothy L.; Church, George M.; Moor, Bart De; Eskin, Eleazar; Favorov, Alexander V.; Frith, Martin C.; Fu, Yutao; Kent, W. James; Makeev, Vsevolod J.; Миронов, А.; Noble, William Stafford; Pavesi, Giulio; Pesole, Graziano; Régnier, Mireille; Simonis, Nicolas; Sinha, Saurabh; Thijs, Gert; Helden, Jacques van; Vandenbogaert, Mathias; Weng, Zhiping; Workman, Christopher T.; Yuan, Chun; Zhu, Zhou

doi:10.1038/nbt1053

Cited by 1,113 publications

(1,184 citation statements)

References 17 publications

Supporting

Mentioning

1,154

Contrasting

Unclassified

Order By: Relevance

“…This is in practice a complex task because the application domain may be skewed in two ways 4 . First, for many relevant bioinformatics problems the prevalence of positives in nature q P = ( TP + FN )/( TP + TN + FP + FN ) does not necessarily match the training set q P and is hard to estimate 2, 5 . Second, the yields (or costs) for correct and incorrect classification of positives and negatives in the machine learning paradigm ( Y TP , Y TN , Y FP , Y FN ) may be different from each other and highly context-dependent 1, 3 .…”

Section: Introductionmentioning

confidence: 99%

Optimal threshold estimation for binary classifiers using game theory

Sánchez

2016

F1000Res

View full text Add to dashboard Cite

Many bioinformatics algorithms can be understood as binary classifiers. They are usually trained by maximizing the area under the receiver operating characteristic ( ROC) curve. On the other hand, choosing the best threshold for practical use is a complex task, due to uncertain and context-dependent skews in the abundance of positives in nature and in the yields/costs for correct/incorrect classification. We argue that considering a classifier as a player in a zero-sum game allows us to use the minimax principle from game theory to determine the optimal operating point. The proposed classifier threshold corresponds to the intersection between the ROC curve and the descending diagonal in ROC space and yields a minimax accuracy of 1-FPR. Our proposal can be readily implemented in practice, and reveals that the empirical condition for threshold estimation of “specificity equals sensitivity” maximizes robustness against uncertainties in the abundance of positives in nature and classification costs.

show abstract

Section: Introductionmentioning

confidence: 99%

Optimal threshold estimation for binary classifiers using game theory

Sánchez

2016

F1000Res

View full text Add to dashboard Cite

show abstract

“…Computational prediction of cis-regulatory binding sites is widely acknowledged as a difficult task [1]. Binding sites are notoriously variable from instance to instance and they can be located considerable distances from the gene being regulated in higher eukaryotes.…”

Section: Introductionmentioning

confidence: 99%

Combining experts in order to identify binding sites in yeast and mouse genomic data

et al. 2008

View full text Add to dashboard Cite

Abstract. The identification of cis-regulatory binding sites in DNA is a difficult problem in computational biology. To obtain a full understanding of the complex machinery embodied in genetic regulatory networks it is necessary to know both the identity of the regulatory transcription factors together with the location of their binding sites in the genome. We show that using an SVM together with data sampling, to integrate the results of individual algorithms specialised for the prediction of binding site locations, can produce significant improvements upon the original algorithms. These results make more tractable the expensive experimental procedure of actually verifying the predictions.

show abstract

“…A statistic comparing the accuracy of the main tools to discover TFBSs is found in Tompa [114], but it is very difficult to compare the performance of methods, in particular on complex genomes like the human genome.…”

Section: Promoter Analysismentioning

confidence: 99%

Microarray data analysis and mining approaches

Cordero

Botta²,

Calogero³

2008

Briefings in Functional Genomics and Proteomics

View full text Add to dashboard Cite

Microarray based transcription profiling is now a consolidated methodology and has widespread use in areas such as pharmacogenomics, diagnostics and drug target identification. Large-scale microarray studies are also becoming crucial to a new way of conceiving experimental biology. A main issue in microarray transcription profiling is data analysis and mining. When microarrays became a methodology of general use, considerable effort was made to produce algorithms and methods for the identification of differentially expressed genes. More recently, the focus has switched to algorithms and database development for microarray data mining. Furthermore, the evolution of microarray technology is allowing researchers to grasp the regulative nature of transcription, integrating basic expression analysis with mRNA characteristics, i.e. exon-based arrays, and with DNA characteristics, i.e. comparative genomic hybridization, single nucleotide polymorphism, tiling and promoter structure. In this article, we will review approaches used to detect differentially expressed genes and to link differential expression to specific biological functions.

show abstract

Assessing computational tools for the discovery of transcription factor binding sites

Cited by 1,113 publications

References 17 publications

Optimal threshold estimation for binary classifiers using game theory

Optimal threshold estimation for binary classifiers using game theory

Combining experts in order to identify binding sites in yeast and mouse genomic data

Microarray data analysis and mining approaches

Contact Info

Product

Resources

About