Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors

Omidi, Saeed; Zavolan, Mihaela; Breda, Jérémie; Berger, Severin; Nimwegen, Erik van

doi:10.1371/journal.pcbi.1005176

Cited by 11 publications

(8 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An intrinsic limitation to PFMs/PWMs is that they ignore inter-nucleotide dependencies within TFBSs ( 9 – 13 ). TF–DNA interaction data derived from next-generation sequencing assays has improved the computational modeling of TF binding ( 14 – 19 ). For example, the TF flexible models (TFFMs) ( 14 ), based on first-order hidden Markov models, capture dinucleotide dependencies within TFBSs and were introduced in the 2016 release of the JASPAR database.…”

Section: Introductionmentioning

confidence: 99%

JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework

Khan

Fornés

Stigliani

et al. 2017

Nucleic Acids Research

1,239

865

View full text Add to dashboard Cite

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.

show abstract

Section: Introductionmentioning

confidence: 99%

JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework

Khan

Fornés

Stigliani

et al. 2017

Nucleic Acids Research

1,239

865

View full text Add to dashboard Cite

show abstract

“…It is widely recognized that PWMs are not adequate representations of TFBS complexity, which contain significant positional correlations [18] and multiple efforts have been made to go beyond the PWM model [2][3][4][5]. As a complementary approach, we show that positional dependency effects can be examined by conditioning PSSVs on specific ancestral nucleotides.…”

Section: Multiple Tfbs Exhibit Positional Correlations Based On Ances...mentioning

confidence: 99%

“…However, positional dependencies are known to occur in binding sites of many TFs. Extensions to the basic PWM format have been proposed to account for positional dependencies [2][3][4][5]. Nevertheless, PWMs persist as the dominant motif representation format, because of ease of calculation and because they are easily visualized via sequence logos [6].…”

mentioning

confidence: 99%

See 1 more Smart Citation

Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model

Selvakumar,

Siddharthan

2024

R. Soc. Open Sci.

View full text Add to dashboard Cite

Transcription factor binding sites (TFBS), like other DNA sequence, evolve via mutation and selection relating to their function. Models of nucleotide evolution describe DNA evolution via single-nucleotide mutation. A stationary vector of such a model is the long-term distribution of nucleotides, unchanging under the model. Neutrally evolving sites may have uniform stationary vectors, but one expects that sites within a TFBS instead have stationary vectors reflective of the fitness of various nucleotides at those positions. We introduce ‘position-specific stationary vectors’ (PSSVs), the collection of stationary vectors at each site in a TFBS locus, analogous to the position weight matrix (PWM) commonly used to describe TFBS. We infer PSSVs for human TFs using two evolutionary models (Felsenstein 1981 and Hasegawa-Kishino-Yano 1985). We find that PSSVs reflect the nucleotide distribution from PWMs, but with reduced specificity. We infer ancestral nucleotide distributions at individual positions and calculate ‘conditional PSSVs’ conditioned on specific choices of majority ancestral nucleotide. We find that certain ancestral nucleotides exert a strong evolutionary pressure on neighbouring sequence while others have a negligible effect. Finally, we present a fast likelihood calculation for the F81 model on moderate-sized trees that makes this approach feasible for large-scale studies along these lines.

show abstract

“…Position weight matrices assume each substitution at a base pair position has an independent effect on the binding affinity of the protein to the motif and the magnitude of the effect is related to conservation of the base pair position in the frequency matrix (Stormo, 2000). There is no shortage of programs that can search sequences based on the position weight matrix (Frith, Li, & Weng, 2003;Kel et al, 2003;Tan & Lenhard, 2016;Wang, Martins, & Danko, 2016); however, the generation of a position frequency matrix can involve bias (Teytelman, Thurtle, Rine, & van Oudenaarden, 2013) and the assumption of independence of base pairs may often be unwarranted (Bulyk, Johnson, & Church, 2002;Man & Stormo, 2001;Omidi et al, 2017). The lack of independence is especially nontrivial in EREs, as loss of a perfect half site has a larger effect on the binding affinity than point mutations after the half site is lost (Deegan et al, 2011;Tyulmenkov & Klinge, 2001).…”

mentioning

confidence: 99%

erefinder: Genome‐wide detection of oestrogen response elements

Anderson

Jones

2019

Molecular Ecology Resources

View full text Add to dashboard Cite

Oestrogen response elements (EREs) are specific DNA sequences to which ligand‐bound oestrogen receptors (ERs) physically bind, allowing them to act as transcription factors for target genes. Locating EREs and ER responsive regions is therefore a potentially important component of the study of oestrogen‐regulated pathways. Here, we report the development of a novel software tool, erefinder, which conducts a genome‐wide, sliding‐window analysis of oestrogen receptor binding affinity. We demonstrate the effects of adjusting window size and highlight the program's general agreement with ChIP studies. We further provide two examples of how erefinder can be used for comparative approaches. erefinder can handle large input files, has settings to allow for broad and narrow searches, and provides the full output to allow for greater data manipulation. These features facilitate a wide range of hypothesis testing for researchers and make erefinder an excellent tool to aid in oestrogen‐related research.

show abstract

Automated incorporation of pairwise dependency in transcription factor binding site prediction using dinucleotide weight tensors

Cited by 11 publications

References 32 publications

JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework

JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework

Position-specific evolution in transcription factor binding sites, and a fast likelihood calculation for the F81 model

erefinder: Genome‐wide detection of oestrogen response elements

Contact Info

Product

Resources

About