MPAthic: Quantitative Modeling of Sequence-Function Relationships for massively parallel assays

Ireland, William T.; Kinney, Justin B.

doi:10.1101/054676

Cited by 11 publications

(11 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The identified binding sites are further interrogated by performing information-based modeling with the Sort-Seq data. Here, we generate energy matrix models ( 13 , 25 ) that describe the sequence-dependent energy of interaction of a transcription factor at each putative binding site. For each matrix, we use a convention that the wild-type sequence is set to have an energy of zero (an example energy matrix is in Fig.…”

Section: Resultsmentioning

confidence: 99%

Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria

Belliveau

Barnes

Ireland

et al. 2018

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

View full text Add to dashboard Cite

SignificanceOrganisms must constantly make regulatory decisions in response to a change in cellular state or environment. However, while the catalog of genomes expands rapidly, we remain ignorant about how the genes in these genomes are regulated. Here, we show how a massively parallel reporter assay, Sort-Seq, and information-theoretic modeling can be used to identify regulatory sequences. We then use chromatography and mass spectrometry to identify the regulatory proteins that bind these sequences. The approach results in quantitative base pair-resolution models of promoter mechanism and was shown in both well-characterized and unannotated promoters in Escherichia coli. Given the generality of the approach, it opens up the possibility of quantitatively dissecting the mechanisms of promoter function in a wide range of bacteria.

show abstract

Section: Resultsmentioning

confidence: 99%

Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria

Belliveau

Barnes

Ireland

et al. 2018

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

View full text Add to dashboard Cite

show abstract

“…[27,62], we use a Markov Chain Monte Carlo (MCMC) algorithm to infer a set of energy values (in arbitrary units) for each energy matrix position that maximizes the mutual information between binding site sequence and fluorescence bin. This inference is performed using the MPAthic software package [63].…”

Section: B Bayesian Inference Of Energy Matrix Modelsmentioning

confidence: 99%

Mapping DNA sequence to transcription factor binding energy in vivo

Barnes

Belliveau

Ireland

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

Despite the central importance of transcriptional regulation in systems biology, it has proven difficult to determine the regulatory mechanisms of individual genes, let alone entire gene networks. It is particularly difficult to analyze a promoter sequence and identify the locations, regulatory roles, and energetic properties of binding sites for transcription factors and RNA polymerase. In this work, we present a strategy for interpreting transcriptional regulatory sequences using in vivo methods (i.e. the massively parallel reporter assay Sort-Seq) to formulate quantitative models that map a transcription factor binding site's DNA sequence to transcription factor-DNA binding energy. We use these models to predict the binding energies of transcription factor binding sites to within 1 k B T of their measured values. We further explore how such a sequence-energy mapping relates to the mechanisms of trancriptional regulation in various promoter contexts. Specifically, we show that our models can be used to design specific induction responses, analyze the effects of amino acid mutations on DNA sequence preference, and determine how regulatory context affects a transcription factor's sequence specificity.

show abstract

“…General purpose methods are ones that can flexibly analyze data from a range of study designs. While it is often of interest to study the effect of sequence features on the estimated activity levels of MPRA sequences (using tools such as MPAthic (Ireland and Kinney, 2016)), typically some sort of differential analysis is needed first to group interesting sequences together. This would usually involve comparing the activity of each putative regulatory sequence of interest to a suitable negative control.…”

Section: Introductionmentioning

confidence: 99%

Linear models enable powerful differential activity analysis in massively parallel reporter assays

Myint

Avramopoulos

Goff

et al. 2017

Preprint

View full text Add to dashboard Cite

Massively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. However, development of statistical analysis methods has not kept pace with the use of this assay. We present a linear model framework, mpralm, for the differential analysis of activity measures from these experiments that we show is calibrated and powerful. We show that it outperforms statistical tests that are commonly used in the literature, in the first comprehensive evaluation of statistical methods on several datasets. We investigate the theoretical and real-data properties of barcode summarization methods, and show an unappreciated impact of summarization method for some datasets. Finally, we perform a power analysis and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has limited impact; we recommend to always use at least 4 replicates. These results inform recommendations for differential analysis, general group comparisons, and power analysis. Our contributions in investigating the functional dependence of statistical power on sample sizes and sequencing depth will help MPRA practitioners make informed choices in study design, and lead to improved inference.

show abstract

MPAthic: Quantitative Modeling of Sequence-Function Relationships for massively parallel assays

Cited by 11 publications

References 59 publications

Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria

Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria

Mapping DNA sequence to transcription factor binding energy in vivo

Linear models enable powerful differential activity analysis in massively parallel reporter assays

Contact Info

Product

Resources

About