Abstract:Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical … Show more
“…An intrinsic limitation to PFMs/PWMs is that they ignore inter-nucleotide dependencies within TFBSs ( 9 – 13 ). TF–DNA interaction data derived from next-generation sequencing assays has improved the computational modeling of TF binding ( 14 – 19 ). For example, the TF flexible models (TFFMs) ( 14 ), based on first-order hidden Markov models, capture dinucleotide dependencies within TFBSs and were introduced in the 2016 release of the JASPAR database.…”
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.
“…An intrinsic limitation to PFMs/PWMs is that they ignore inter-nucleotide dependencies within TFBSs ( 9 – 13 ). TF–DNA interaction data derived from next-generation sequencing assays has improved the computational modeling of TF binding ( 14 – 19 ). For example, the TF flexible models (TFFMs) ( 14 ), based on first-order hidden Markov models, capture dinucleotide dependencies within TFBSs and were introduced in the 2016 release of the JASPAR database.…”
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.
“…It is widely recognized that PWMs are not adequate representations of TFBS complexity, which contain significant positional correlations [18] and multiple efforts have been made to go beyond the PWM model [2][3][4][5]. As a complementary approach, we show that positional dependency effects can be examined by conditioning PSSVs on specific ancestral nucleotides.…”
Section: Multiple Tfbs Exhibit Positional Correlations Based On Ances...mentioning
confidence: 99%
“…However, positional dependencies are known to occur in binding sites of many TFs. Extensions to the basic PWM format have been proposed to account for positional dependencies [2][3][4][5]. Nevertheless, PWMs persist as the dominant motif representation format, because of ease of calculation and because they are easily visualized via sequence logos [6].…”
Transcription factor binding sites (TFBS), like other DNA sequence, evolve via mutation and selection relating to their function. Models of nucleotide evolution describe DNA evolution via single-nucleotide mutation. A stationary vector of such a model is the long-term distribution of nucleotides, unchanging under the model. Neutrally evolving sites may have uniform stationary vectors, but one expects that sites within a TFBS instead have stationary vectors reflective of the fitness of various nucleotides at those positions. We introduce ‘position-specific stationary vectors’ (PSSVs), the collection of stationary vectors at each site in a TFBS locus, analogous to the position weight matrix (PWM) commonly used to describe TFBS. We infer PSSVs for human TFs using two evolutionary models (Felsenstein 1981 and Hasegawa-Kishino-Yano 1985). We find that PSSVs reflect the nucleotide distribution from PWMs, but with reduced specificity. We infer ancestral nucleotide distributions at individual positions and calculate ‘conditional PSSVs’ conditioned on specific choices of majority ancestral nucleotide. We find that certain ancestral nucleotides exert a strong evolutionary pressure on neighbouring sequence while others have a negligible effect. Finally, we present a fast likelihood calculation for the F81 model on moderate-sized trees that makes this approach feasible for large-scale studies along these lines.
“…Position weight matrices assume each substitution at a base pair position has an independent effect on the binding affinity of the protein to the motif and the magnitude of the effect is related to conservation of the base pair position in the frequency matrix (Stormo, 2000). There is no shortage of programs that can search sequences based on the position weight matrix (Frith, Li, & Weng, 2003;Kel et al, 2003;Tan & Lenhard, 2016;Wang, Martins, & Danko, 2016); however, the generation of a position frequency matrix can involve bias (Teytelman, Thurtle, Rine, & van Oudenaarden, 2013) and the assumption of independence of base pairs may often be unwarranted (Bulyk, Johnson, & Church, 2002;Man & Stormo, 2001;Omidi et al, 2017). The lack of independence is especially nontrivial in EREs, as loss of a perfect half site has a larger effect on the binding affinity than point mutations after the half site is lost (Deegan et al, 2011;Tyulmenkov & Klinge, 2001).…”
Oestrogen response elements (EREs) are specific DNA sequences to which ligand‐bound oestrogen receptors (ERs) physically bind, allowing them to act as transcription factors for target genes. Locating EREs and ER responsive regions is therefore a potentially important component of the study of oestrogen‐regulated pathways. Here, we report the development of a novel software tool, erefinder, which conducts a genome‐wide, sliding‐window analysis of oestrogen receptor binding affinity. We demonstrate the effects of adjusting window size and highlight the program's general agreement with ChIP studies. We further provide two examples of how erefinder can be used for comparative approaches. erefinder can handle large input files, has settings to allow for broad and narrow searches, and provides the full output to allow for greater data manipulation. These features facilitate a wide range of hypothesis testing for researchers and make erefinder an excellent tool to aid in oestrogen‐related research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.