Transcription factors activate gene expression in development, homeostasis, and stress with DNA binding domains and activation domains. Although there exist excellent computational models for predicting DNA binding domains from protein sequence (Stormo, 2013), models for predicting activation domains from protein sequence have lagged behind (Erijman et al., 2020; Ravarani et al., 2018; Sanborn et al., 2021), particularly in metazoans. We recently developed a simple and accurate predictor of acidic activation domains on human transcription factors (Staller et al., 2022). Here, we show how the accuracy of this human predictor arises from the balance between hydrophobic and acidic residues, which together are necessary for acidic activation domain function. When we combine our predictor with the predictions of neural network models trained in yeast, the intersection is more predictive than individual models, emphasizing that each approach carries orthogonal information. We synthesize these findings into a new set of activation domain predictions on human transcription factors.
Transcription factors activate gene expression in development, homeostasis, and stress with DNA binding domains and activation domains. Although there exist excellent computational models for predicting DNA binding domains from protein sequence (Stormo 2013), models for predicting activation domains from protein sequence have lagged behind (Ravarani et al. 2018; Erijman et al. 2020; Sanborn et al. 2021), particularly in metazoans. We recently developed a simple and accurate predictor of acidic activation domains on human transcription factors (Staller et al. 2022). Here, we show how the accuracy of this human predictor arises from the clustering of aromatic, leucine, and acidic residues, which together are necessary for acidic activation domain function. When we combine our predictor with the predictions of convolutional neural network models trained in yeast, the intersection is more accurate than individual models, emphasizing that each approach carries orthogonal information. We synthesize these findings into a new set of activation domain predictions on human transcription factors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.