Ammar Tareen scite author profile

Summary Sequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA and protein sequences, yet it is currently difficult to generate and customize such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from either a matrix-like array of numbers or a multiple-sequence alignment. Logos are rendered as native matplotlib objects that are easy to stylize and incorporate into multi-panel figures. Availability and implementation Logomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Documentation is provided at http://logomaker.readthedocs.io; source code is available at http://github.com/jbkinney/logomaker.

show abstract

MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect

Tareen

Kooshkbaghi

Posfai

et al. 2022

Genome Biol

View full text Add to dashboard Cite

Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps—including biophysically interpretable models—from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.

show abstract

Evolution of DNA replication origin specification and gene silencing mechanisms

Tareen

Sheu

et al. 2020

Nat Commun

View full text Add to dashboard Cite

DNA replication in eukaryotic cells initiates from replication origins that bind the Origin Recognition Complex (ORC). Origin establishment requires well-defined DNA sequence motifs in Saccharomyces cerevisiae and some other budding yeasts, but most eukaryotes lack sequence-specific origins. A 3.9 Å structure of S. cerevisiae ORC-Cdc6-Cdt1-Mcm2-7 (OCCM) bound to origin DNA revealed that a loop within Orc2 inserts into a DNA minor groove and an α-helix within Orc4 inserts into a DNA major groove. Using a massively parallel origin selection assay coupled with a custom mutual-information-based modeling approach, and a separate analysis of whole-genome replication profiling, here we show that the Orc4 α-helix contributes to the DNA sequence-specificity of origins in S. cerevisiae and Orc4 α-helix mutations change genome-wide origin firing patterns. The DNA sequence specificity of replication origins, mediated by the Orc4 α-helix, has co-evolved with the gain of ORC-Sir4-mediated gene silencing and the loss of RNA interference.

show abstract

Density Estimation on Small Data Sets

2018

View full text Add to dashboard Cite

How might a smooth probability distribution be estimated, with accurately quantified uncertainty, from a limited amount of sampled data? Here we describe a field-theoretic approach that addresses this problem remarkably well in one dimension, providing an exact nonparametric Bayesian posterior without relying on tunable parameters or large-data approximations. Strong non-Gaussian constraints, which require a non-perturbative treatment, are found to play a major role in reducing distribution uncertainty. A software implementation of this method is provided.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ammar Tareen

Logomaker: beautiful sequence logos in Python

MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect

Evolution of DNA replication origin specification and gene silencing mechanisms

Density Estimation on Small Data Sets

Contact Info

Product

Resources

About